CN118093911A - Fuzzy Transformer hash method for medical image retrieval target attack - Google Patents

Fuzzy Transformer hash method for medical image retrieval target attack Download PDF

Info

Publication number
CN118093911A
CN118093911A CN202410234959.5A CN202410234959A CN118093911A CN 118093911 A CN118093911 A CN 118093911A CN 202410234959 A CN202410234959 A CN 202410234959A CN 118093911 A CN118093911 A CN 118093911A
Authority
CN
China
Prior art keywords
loss
hash
model
prototype
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410234959.5A
Other languages
Chinese (zh)
Inventor
丁卫平
周琳琳
刘传升
黄嘉爽
程纯
鞠恒荣
陈悦鹏
侯涛
周天奕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202410234959.5A priority Critical patent/CN118093911A/en
Publication of CN118093911A publication Critical patent/CN118093911A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a fuzzy transform hash method for medical image retrieval target attack, which solves the technical problems that the current depth hash model has poor robustness in medical image retrieval and is easily influenced by an countermeasure sample. The technical proposal is as follows: establishing a medical image database, and establishing a fuzzy transform hash model, wherein the model mainly comprises four parts: a visual transducer hash model, a prototype network, a residual fuzzy generator and a discriminator; calculating the loss function of each part and optimizing by adopting an alternative learning algorithm; and (3) taking the prototype code and the countermeasure sample generated by the test set as query samples to search in a database, and evaluating the target attack performance of the model by using the target average precision t-MAP. The beneficial effects of the invention are as follows: the robustness and the anti-interference performance of the model in the medical image retrieval process are enhanced, and the accuracy of medical image retrieval is improved.

Description

Fuzzy Transformer hash method for medical image retrieval target attack
Technical Field
The invention relates to the technical field of medical image processing, in particular to a fuzzy transform hash method for medical image retrieval target attack.
Background
In recent years, with the development of medical imaging technology, the number of medical images has been increasing. In the medical field, doctors make diagnosis and formulate treatment solutions by examining pathological images and related cases of patients. Therefore, doctors often need to retrieve valuable images from different medical image databases to improve their clinical practices. However, since medical images generally have similar structures and textures, medical image retrieval presents significant challenges. Furthermore, the pathological images of different diseases may exhibit similar characteristics, which further highlights the necessity of improving the retrieval accuracy.
In the early days, the near nearest neighbor (Approximate Nearest Neighbor, ANN) search method was of great interest because of its efficiency and effectiveness. However, this method has problems of high memory cost, slow search speed and low accuracy. To solve these problems, a hash technique is introduced as one solution. Hashing techniques are capable of mapping high-dimensional data into compact binary code while maintaining semantic similarity, thus providing significant advantages in terms of storage cost and retrieval speed. With the rapid development of deep learning, a method combining deep learning and hash is widely applied. Among them, the deep hash method of automatically extracting features using the deep neural network (Deep Neural Network, DNN) has been greatly successful in learning the hash, and has been proved to have superior performance to the conventional hash method. However, deep neural networks tend to be susceptible to resistive examples, spoofing networks to make false predictions. The deep hash method achieves encouraging performance in many benchmarks, but at the same time inevitably inherits the characteristics of poor DNN robustness and susceptibility to challenge samples.
Disclosure of Invention
The invention aims to provide a fuzzy transform hash method for medical image retrieval target attack, which mainly solves the technical problems that the current depth hash model has poor robustness and is easily influenced by a countermeasure sample, and belongs to the technical field of medical image processing. The method greatly enhances the robustness and the anti-interference performance of the model in the medical image retrieval process, and improves the accuracy of medical image retrieval.
In order to achieve the aim of the invention, the invention adopts the technical scheme as follows: a fuzzy Transformer hashing method for medical image retrieval target attacks, comprising the steps of:
S10: firstly, a medical image database is established, then samples in the database are preprocessed and expanded, and finally a training set T r, a test set T e and a database sample set T d are divided;
S20: a fuzzy transform hash model is constructed, which is mainly composed of four parts: a visual transducer hash model, a prototype network, a residual fuzzy generator and a discriminator; the visual transducer hash model comprises two modules, namely a feature learning module and a hash code learning module, wherein the feature learning module is responsible for extracting depth features of medical images, and the hash code learning module is responsible for mapping the extracted depth features into hash codes; the prototype network is responsible for extracting semantic features f L of various image labels in the database and mapping the semantic features f L into prototype codes; the residual ambiguity generator firstly carries out deconvolution on semantic features F L extracted by a prototype network to form a feature map F L with the same size as an original image, and then adds the feature map F L with the original image q t to generate a contrast sample q' t; the discriminator is then responsible for discriminating whether its input is from the original image or the generated challenge sample;
S30: respectively calculating loss functions of all the parts according to the model constructed in the step S20: loss of visual transducer hash model L H, loss of prototype network L P, loss of residual blur generator L G, loss of arbiter L D;
S40: according to the loss function of each part of the calculated model, sequentially optimizing a visual transducer hash model, a prototype network, a residual error fuzzy generator and a discriminator by using an alternative learning algorithm, and storing the optimized model;
S50: medical images in the test set T e are sequentially input into a visual transducer hash model, a prototype network and a residual error fuzzy generator to generate corresponding prototype codes and countermeasure samples, the prototype codes and the countermeasure samples are used as query samples to be retrieved from the database sample set T d, and finally target average precision T-MAP is used for evaluating target attack performance.
As the fuzzy transform hash method for the medical image retrieval target attack, which is provided by the invention, the whole model is constructed in detail in the step S20, and the specific steps are as follows:
S21: the visual transducer hash model is mainly responsible for extracting characteristics of an attacked object and mapping the characteristics into hash codes, wherein the visual transducer model is adopted in the characteristic extraction stage;
s211: the power mean transform layer is integrated into the input and output stages of the transform encoder to enhance the nonlinearity of the model; if the input and output of the power mean transform layer are x and l, respectively, the PMT transforms x into [ ln (x+β), ln 2 (x+β) ], where β is a constant; during the forward propagation process, the forward direction propagation time, Representing the gradient of PMT output (y= [ ln (x+β), ln 2 (x+β) ]), calculated using the chain law:
The power mean value conversion layer is integrated into the whole model through forward and backward propagation, and the integration enables the model to learn more complex information in the training process, so that the nonlinearity of the model is enhanced;
S212: after step S211, the spatial pyramid pooling layer module accepts the input feature map and generates feature subgraphs with sizes of 1×1, 2×2, 4×4, and 16×16, respectively, through the adaptive averaging pooling layer; the generated feature subgraphs are flattened and spliced, so that features with different scales can be conveniently extracted from the input feature graphs, and multi-scale features are generated, so that information from different areas is integrated;
Integrating the spatial pyramid pooling layer module into a multi-head attention mechanism to form a multi-head spatial pyramid pooling attention mechanism module; in a multi-head space pyramid pooling attention mechanism module, firstly, carrying out multi-scale feature extraction and fusion on a value V and a key K through a space pyramid pooling layer module; next, the scale click attention layer processes query Q and pooled key value pairs K 'and V'; this layer calculates the attention score between query Q and key-value pairs K 'and V';
S213: the output of the multi-head space pyramid pooling attention mechanism module and the output of a transducer encoder can be obtained through the step S212; if the dimensions of query Q and key K' are d k, the output of the multi-headed spatial pyramid pooling attention mechanism module can be expressed as:
Wherein K 'is a value obtained after the spatial pyramid pooling operation is performed on K, and V' is a value obtained after the spatial pyramid pooling operation is performed on V; MHSPA (·, ·, ·) represents a multi-headed spatial pyramid pooling attention operation; QK' T calculates the similarity between each query and all keys; normalizing the similarity score by a softmax (·) function to obtain the attention weight of each query, wherein the sum of the weights is 1; scaling factor The influence degree of dot product is reduced; the scaling dot product attention layer not only enhances the capability of the model to process long-range dependency in input, but also relieves the problem of unstable numerical value caused by large dot product magnitude;
if the input to the transducer encoder is Z l, l=1, 2,..l, then the output Z o can be expressed as:
Zo=MLP(LN(MHSPA(LN(zl))+zl))+MHSPA(LN(zl))+zl
Wherein, MLP (& gt) represents the operation of the multi-layer perceptron, and LN (& gt) represents the normalization operation of the layers;
S214: after obtaining the output of the transducer encoder according to step S213, generating features through a power mean value conversion layer, and classifying and generating hash codes through three completely connected layers;
S22: the prototype network is mainly responsible for extracting semantic features of labels and generating corresponding prototype codes, and the input of the prototype network is a unique label set of various images in a training set T r N u represents the number of unique tags, c represents the number of categories; firstly, extracting semantic features f L from a tag set L u through two full connection layers; secondly, the semantic feature f L passes through the full connection layer again to obtain classification information and corresponding prototype codes; finally, the semantic feature f L is used as part of the input assist challenge sample generation of the residual blur generator;
S23: the input of the residual ambiguity generator comprises two parts, one part is a feature F L which is the same as the original image in size and is formed by deconvolution and lamination of 5 layers of semantic features F L extracted from a prototype network, and the other part is an original image q t; taking (q t+FL) as a new input, forming a feature F' L with the same size as the original image through a 3-layer convolution layer, 1 residual blurring block and a 2-layer deconvolution layer; then (q t+F′L) is processed by 1 convolution layer to form a countermeasure sample q' t with the same size as the original image;
S231: the residual blurring block in the step S23 is an important component part in a residual blurring generator and is formed by stacking 6 identical residual blurring units; each residual error blurring unit firstly passes through 2 layers of convolution layers and then is connected with the input feature map in a jumping manner to obtain a feature map F R; then F R continuously passes through the fuzzy layer Fl (-) and is connected with the feature map F R in a jumping way to obtain a feature map F R+Fl(FR); finally, F R+Fl(FR) is used as a new input to sequentially pass through the residual error blurring units to obtain a blurred feature map F' R; wherein the input of the blur layer is H R、WR and C R represent the height, width, and number of channels of the input feature map, respectively;
S232: converting the feature map of each channel obtained in the step S231 into a one-dimensional feature vector, and subjecting all feature points in the feature vector of each channel to M times of fuzzy membership function calculation; each calculation of the fuzzy membership function assigns a fuzzy label to the feature points, namely:
i=1,...,HR×WR;c=1,...,CR;k=1,...,M
Wherein i represents each characteristic point in each channel, mu k,c and sigma k,c respectively represent the mean value and standard deviation of the c-th channel in the calculation of the k-th Gaussian blur membership function, and Z i,k,c represents the blur label of the k-th Gaussian blur membership function of the i-th characteristic point in the c-th channel;
AND then merging fuzzy labels calculated by M Gaussian fuzzy membership functions through AND fuzzy logic operation, namely:
i=1,...,HR×WR;c=1,...,CR
Wherein Z i,c represents ambiguity; then, the ambiguity Z i,c is subjected to shape transformation to obtain an ambiguous feature map
S24: the discriminator takes as input the original image q t and the challenge sample q' t, passes through the 5 convolutional layers and the 3 fully connected layers in sequence, and predicts the label of the image in the last fully connected layer and determines whether the input is from the original image or the challenge sample.
As the fuzzy transform hash method for medical image retrieval target attack provided by the invention, in step S30, the loss L H of the visual transform hash model, the loss L P of the prototype network, the loss L G of the residual fuzzy generator and the loss L D of the discriminator are specifically as follows:
s31: the loss L H of the visual transducer hash model is calculated, and is specifically divided into a pairwise loss L HP, a quantization loss L HQ, a balance loss L HB and a classification loss L HC, and the objective functions of the visual transducer hash model are as follows:
s.t.V∈{-1,+1}n×k
Wherein α h、βh and γ h are insensitive hyper-parameters; n is the number of database samples, k is the number of hash codes, and V represents the hash codes of the database samples;
S311: first, the pair loss L HP is calculated:
s.t.U=[u1,u2,...,um]T∈{-1,+1}m×k,V=[v1,v2,...,vn]T∈{-1,+1}n×k
Wherein U i and V j represent hash codes of the t training sample and the j database sample, respectively, U and V represent hash codes of the training sample and the database sample, S represents a similarity matrix between the query set and the database, U t=h(qt)=sign(Φ(qth)), h (·) represents a hash function, q t represents an original image, Φ (q th)∈Rk represents an output of Ha Xiceng, k represents a number of bits of the hash codes, m is a number of training samples, θ h represents a network parameter of the full connection layer before Ha Xiceng;
Considering that the solution to the sign (·) of the sign in L HP is a discrete optimization problem, the sign function is approximated using the hyperbolic tangent function tanh (·), namely:
S312: since the hyperbolic tangent function tanh (·) is used to approximate the sign (·) function in the pairwise penalty L HP, a regularization term is added between the real value output and the hash code, namely the quantization penalty L HQ:
S313: in order to make the hash code fill the entire hamming space of 2 k as much as possible and to guarantee the balance of each bit, a balance loss L HB is proposed to ensure that the probability of-1 and +1 occurrence on each bit is as equal as possible, defined as follows:
Wherein the mean (·) function is used to calculate the average of the elements in the vector;
s314: and because the number of labels of different images is greatly different, different loss functions are used for the single-label image and the multi-label image in the training process:
(1) Multi-tag classification penalty L multi:
(2) Single tag classification penalty L single:
Where pred (q tc) represents the predictive label of image q t, l t represents the true label of image q t, and θ c represents the network parameters of the fully connected layer prior to the classification layer;
Therefore, the classification penalty of the visual transducer hash model L HC=Lmulti+Lsingle;
S32: the loss L P of the prototype network is calculated, and is specifically divided into a pair likelihood loss L PP, a quantization loss L PQ and a classification loss L PC, and the objective function of the prototype network is as follows:
Where α p、βp and γ p are insensitive super parameters, θ p represents the network parameters of the prototype network;
s321: first, the pair likelihood loss L PP:
Wherein, Similarity matrix between the tag set L u representing various images in the database and the tag set L Q of the query set, n u is the number of prototype codes,/>U epsilon-1, +1} m×k,Vp is a prototype code;
s322: a regularization term, i.e. quantization loss, is introduced between the real value output f p of the prototype network and the prototype code V p:
S323: similar to the visual transducer hash model, the classification loss of the prototype network combines multi-label classification loss and single-label classification loss, taking into account the variation in the number of labels on different images:
(1) Multi-tag classification penalty L mul-p:
(2) Single tag classification penalty L sin-p:
Wherein, Representing predicted labels, L u representing labels of various types of images;
Therefore, the classification loss of the prototype network L PC=Lmul-p+Lsin-p;
S33: the loss L G of the residual blurring generator is calculated, and the method is specifically divided into a reconstruction loss L GR and a hamming loss L GH:
where q t represents the original image and q' t represents the corresponding challenge sample;
Wherein V p represents the prototype code, h (·) represents the hash function, h (q 't) represents the hash code against the sample q' t, dist H (·) represents the hamming distance function, and k represents the number of bits of the hash code;
finally, the objective function of the residual blur generator is as follows:
Where α g and β g are insensitive super parameters, θ g represents the network parameters of the residual blur generator;
S34: calculating loss L D of the discriminator:
Wherein, D (·) represents the arbiter operation;
finally, the objective function of the arbiter is as follows:
where θ d represents the network parameters of the arbiter.
Compared with the prior art, the invention has the beneficial effects that:
(1) In order to solve the problems of low anti-interference capability and insufficient robustness of ViTH, the invention introduces a generated type countermeasure network on the basis of visual transformation hash. In addition, the invention also provides a method for adding the residual fuzzy block into the generator and embedding the fuzzy layer to process the complex relation between the feature mapping and the corresponding generated sample, thereby improving the performance of the model when the model is subjected to targeted attack.
(2) Medical images have higher requirements for the extraction of minuscule features than natural images. The invention introduces a spatial pyramid pool (SPATIAL PYRAMID Pooling, SPP) layer in ViTH and integrates it with Multi-Head Attention (MHA) mechanism of a transformer encoder so as to be able to effectively extract and fuse features of different scales in medical images.
(3) To further improve model performance, the prototype network uses image tags as input, extracts semantic features from them, and passes these features to a residual blur generator to assist in generating the counterexamples.
(4) In order to solve the problems of poor robustness, susceptibility to resistant samples and the like of the deep hash method, a target attack method based on the deep hash applied to medical image retrieval is proposed and gradually applied to practical application. By the method, the accuracy and the reliability of medical image retrieval can be improved, so that the clinical practice of doctors can be better assisted.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
FIG. 1 is a schematic flow chart of a fuzzy transform hash method for medical image retrieval target attack provided by the invention;
FIG. 2 is a block diagram of a fuzzy transform hashing method for medical image retrieval target attacks in accordance with the present invention;
FIG. 3 is a schematic diagram of a spatial pyramid pooling layer (SPATIAL PYRAMID Pooling, SPP) in accordance with the present invention;
FIG. 4 is a schematic diagram of a Multi-headed spatial pyramid pooling Attention mechanism (Multi-HEAD SPATIAL PYRAMID Attention, MHSPA) module in accordance with the present invention;
FIG. 5 is a schematic diagram of a residual block (Residual Fuzzy Block) according to the present invention;
FIG. 6 is an example of image retrieval of an original image on an ISIC 2018 dataset;
fig. 7 is an example of image retrieval on an ISIC 2018 dataset, as exemplified by a challenge sample, in accordance with the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. Of course, the specific embodiments described herein are for purposes of illustration only and are not intended to limit the invention.
Example 1:
Referring to fig. 1 to 7, the present embodiment provides a fuzzy transform hash method for medical image retrieval target attack, which includes the following steps:
S10: firstly, establishing a dermatological medical image database, wherein the database contains 7 common diseases, preprocessing and expanding samples in the database to obtain 5 disease category samples, and finally dividing a training set T r, a test set T e and a database sample set T d;
S20: a fuzzy transform hash model is constructed, which is mainly composed of four parts: a visual transducer hash model, a prototype network, a residual fuzzy generator and a discriminator; the visual transducer hash model comprises two modules, namely a feature learning module and a hash code learning module, wherein the feature learning module is responsible for extracting depth features of medical images, and the hash code learning module is responsible for mapping the extracted depth features into hash codes; the prototype network is responsible for extracting semantic features f L of various image labels in the database and mapping the semantic features f L into prototype codes; the residual ambiguity generator firstly carries out deconvolution on semantic features F L extracted by a prototype network to form a feature map F L with the same size as an original image, and then adds the feature map F L with the original image q t to generate a contrast sample q t'; the discriminator is then responsible for discriminating whether its input is from the original image or the generated challenge sample;
S30: respectively calculating loss functions of all the parts according to the model constructed in the step S20: loss of visual transducer hash model L H, loss of prototype network L P, loss of residual blur generator L G, loss of arbiter L D;
S40: according to the loss function of each part of the calculated model, sequentially optimizing a visual transducer hash model, a prototype network, a residual error fuzzy generator and a discriminator by using an alternative learning algorithm, and storing the optimized model;
S50: medical images in a test set T e are sequentially input into a visual transducer hash model, a prototype network and a residual error fuzzy generator to generate corresponding prototype codes and countermeasure samples, the prototype codes and the countermeasure samples are used as query samples to be retrieved from a database sample set T d, and finally target average precision T-MAP is used for evaluating target attack performance;
Finally, the model achieves t-MAPs of 0.981, 0.984, 0.986, 0.989 and 0.984 at 5 code lengths (12, 24, 36, 48, 60) on ISIC 2018, respectively, and the corresponding MAPs achieve 0.927, 0.931, 0.935, 0.945 and 0.955, respectively; therefore, it is further confirmed that ViTH-RFG plays an important role in improving the target attack performance of the model.
Specifically, in step S20, the specific structures of the main part prototype network, the residual blur generator and the arbiter of the fuzzy Transformer hash method for medical image retrieval target attack are as follows:
S21: the visual transducer hash model is mainly responsible for extracting characteristics of an attacked object and mapping the characteristics into hash codes, wherein the visual transducer model is adopted in the characteristic extraction stage;
S211: the power mean transform layer is integrated into the input and output stages of the transform encoder to enhance the nonlinearity of the model; if the input and output are x and l, respectively, of the power mean transform layer, the PMT transforms x to [ ln (x+β), ln 2 (x+β) ], where β is a constant (β=1); during the forward propagation process, the forward direction propagation time, Representing the gradient of PMT output (y= [ ln (x+β), ln 2 (x+β) ]), calculated using the chain law:
The power mean value conversion layer is integrated into the whole model through forward and backward propagation, and the integration enables the model to learn more complex information in the training process, so that the nonlinearity of the model is enhanced;
S212: after step S211, the spatial pyramid pooling layer module accepts the input feature map and generates feature subgraphs with sizes of 1×1, 2×2, 4×4, and 16×16, respectively, through the adaptive averaging pooling layer; the generated feature subgraphs are flattened and spliced, so that features with different scales can be conveniently extracted from the input feature graphs, and multi-scale features are generated, so that information from different areas is integrated;
Integrating the spatial pyramid pooling layer module into a multi-head attention mechanism to form a multi-head spatial pyramid pooling attention mechanism module; in a multi-head space pyramid pooling attention mechanism module, firstly, carrying out multi-scale feature extraction and fusion on a value V and a key K through a space pyramid pooling layer module; next, the scale click attention layer processes query Q and pooled key value pairs K 'and V'; this layer calculates the attention score between query Q and key-value pairs K 'and V';
S213: the output of the multi-head space pyramid pooling attention mechanism module and the output of a transducer encoder can be obtained through the step S212; if the dimensions of query Q and key K' are d k, the output of the multi-headed spatial pyramid pooling attention mechanism module can be expressed as:
Wherein K 'is a value obtained after the spatial pyramid pooling operation is performed on K, and V' is a value obtained after the spatial pyramid pooling operation is performed on V; MHSPA (·, ·, ·) represents a multi-headed spatial pyramid pooling attention operation; QK' T calculates the similarity between each query and all keys; normalizing the similarity score by a softmax (·) function to obtain the attention weight of each query, wherein the sum of the weights is 1; scaling factor The influence degree of dot product is reduced; the scaling dot product attention layer not only enhances the capability of the model to process long-range dependency in input, but also relieves the problem of unstable numerical value caused by large dot product magnitude;
if the input to the transducer encoder is Z l, l=1, 2,..l, then the output Z o can be expressed as:
Zo=MLP(LN(MHSPA(LN(zl))+zl))+MHSPA(LN(zl))+zl
Wherein, MLP (& gt) represents the operation of the multi-layer perceptron, and LN (& gt) represents the normalization operation of the layers;
S214: after obtaining the output of the transducer encoder according to step S213, generating features through a power mean value conversion layer, and classifying and generating hash codes through three completely connected layers;
S22: the prototype network is mainly responsible for extracting semantic features of labels and generating corresponding prototype codes, and the input of the prototype network is a unique label set of various images in a training set T r N u represents the number of unique tags, c=5 represents the number of categories; firstly, extracting semantic features f L from a tag set L u through two full connection layers; secondly, the semantic feature f L passes through the full connection layer again to obtain classification information and corresponding prototype codes; finally, the semantic feature f L is used as part of the input assist challenge sample generation of the residual blur generator;
S23: the input of the residual ambiguity generator comprises two parts, one part is a feature F L which is the same as the original image in size and is formed by deconvolution and lamination of 5 layers of semantic features F L extracted from a prototype network, and the other part is an original image q t; taking (q t+FL) as a new input, forming a feature F' L with the same size as the original image through a 3-layer convolution layer, 1 residual blurring block and a 2-layer deconvolution layer; then (q t+F′L) is processed by 1 convolution layer to form a countermeasure sample q' t with the same size as the original image;
S231: the residual blurring block in the step S23 is an important component part in a residual blurring generator and is formed by stacking 6 identical residual blurring units; each residual error blurring unit firstly passes through 2 layers of convolution layers and then is connected with the input feature map in a jumping manner to obtain a feature map F R; then F R continuously passes through the fuzzy layer Fl (-) and is connected with the feature map F R in a jumping way to obtain a feature map F R+Fl(FR); finally, F R+Fl(FR) is used as a new input to sequentially pass through the residual error blurring units to obtain a blurred feature map F' R; wherein the input of the blur layer is H R、WR and C R represent the height, width, and number of channels of the input feature map, respectively;
S232: converting the feature map of each channel obtained in the step S231 into a one-dimensional feature vector, and subjecting all feature points in the feature vector of each channel to M times of fuzzy membership function calculation; each calculation of the fuzzy membership function assigns a fuzzy label to the feature points, namely:
i=1,...,HR×WR;c=1,...,CR;k=1,...,M
Wherein i represents each characteristic point in each channel, mu k,c and sigma k,c respectively represent the mean value and standard deviation of the c-th channel in the calculation of the k-th Gaussian blur membership function, and Z i,k,c represents the blur label of the k-th Gaussian blur membership function of the i-th characteristic point in the c-th channel;
AND then merging fuzzy labels calculated by M Gaussian fuzzy membership functions through AND fuzzy logic operation, namely:
i=1,...,HR×WR;c=1,...,CR
Wherein Z i,c represents ambiguity; then, the ambiguity Z i,c is subjected to shape transformation to obtain an ambiguous feature map
S24: the discriminator takes as input the original image q t and the challenge sample q' t, passes through the 5 convolutional layers and the 3 fully connected layers in sequence, and predicts the label of the image in the last fully connected layer and determines whether the input is from the original image or the challenge sample.
Specifically, in step 30, the loss L H of the visual transducer hash model, the loss L P of the prototype network, the loss L G of the residual blur generator, and the loss L D of the arbiter are specifically as follows:
s31: the loss L H of the visual transducer hash model is calculated, and is specifically divided into a pairwise loss L HP, a quantization loss L HQ, a balance loss L HB and a classification loss L HC, and the objective functions of the visual transducer hash model are as follows:
s.t.V∈{-1,+1}n×k
Wherein, α h=100、βh =500 and γ h =10 are insensitive super parameters; n is the number of database samples, k is the number of hash codes, and V represents the hash codes of the database samples;
S311: first, the pair loss L HP is calculated:
s.t.U=[u1,u2,...,um]T∈{-1,+1}m×k,V=[v1,v2,...,vn]T∈{-1,+1}n×k
Wherein U i and V j represent hash codes of the t training sample and the j database sample, respectively, U and V represent hash codes of the training sample and the database sample, S represents a similarity matrix between the query set and the database, U t=h(qt)=sign(Φ(qth)), h (·) represents a hash function, q t represents an original image, Φ (q th)∈Rk represents an output of Ha Xiceng, k represents a number of bits of the hash codes, m is a number of training samples, θ h represents a network parameter of the full connection layer before Ha Xiceng;
Considering that the solution to the sign (·) of the sign in L HP is a discrete optimization problem, the sign function is approximated using the hyperbolic tangent function tanh (·), namely:
S312: since the hyperbolic tangent function tanh (·) is used to approximate the sign (·) function in the pairwise penalty L HP, a regularization term is added between the real value output and the hash code, namely the quantization penalty L HQ:
S313: in order to make the hash code fill the entire hamming space of 2 k as much as possible and to guarantee the balance of each bit, a balance loss L HB is proposed to ensure that the probability of-1 and +1 occurrence on each bit is as equal as possible, defined as follows:
Wherein the mean (·) function is used to calculate the average of the elements in the vector;
s314: and because the number of labels of different images is greatly different, different loss functions are used for the single-label image and the multi-label image in the training process:
(1) Multi-tag classification penalty L multi:
(2) Single tag classification penalty L single:
Where pred (q tc) represents the predictive label of image q t, l t represents the true label of image q t, and θ c represents the network parameters of the fully connected layer prior to the classification layer;
Therefore, the classification penalty of the visual transducer hash model L HC=Lmulti+Lsingle;
S32: the loss L P of the prototype network is calculated, and is specifically divided into a pair likelihood loss L PP, a quantization loss L PQ and a classification loss L PC, and the objective function of the prototype network is as follows:
Wherein, And γ p =1 is a insensitive hyper-parameter, θ p represents a network parameter of the prototype network;
s321: first, the pair likelihood loss L PP:
Wherein, Similarity matrix between the tag set L u representing various images in the database and the tag set L Q of the query set, n u is the number of prototype codes,/>U epsilon-1, +1} m×k,Vp is a prototype code;
s322: a regularization term, i.e. quantization loss, is introduced between the real value output f p of the prototype network and the prototype code V p:
S323: similar to the visual transducer hash model, the classification loss of the prototype network combines multi-label classification loss and single-label classification loss, taking into account the variation in the number of labels on different images:
(1) Multi-tag classification penalty L mul-p:
(2) Single tag classification penalty L sin-p:
Wherein, Representing predicted labels, L u representing labels of various types of images;
Therefore, the classification loss of the prototype network L PC=Lmul-p+Lsin-p;
S33: the loss L G of the residual blurring generator is calculated, and the method is specifically divided into a reconstruction loss L GR and a hamming loss L GH:
where q t represents the original image and q' t represents the corresponding challenge sample;
Wherein V p represents the prototype code, h (·) represents the hash function, h (q 't) represents the hash code against the sample q' t, dist H (·) represents the hamming distance function, and k represents the number of bits of the hash code;
finally, the objective function of the residual blur generator is as follows:
Where α g =25 and α g =5 are insensitive super parameters, θ g represents the network parameters of the residual blur generator;
S34: calculating loss L D of the discriminator:
Wherein, D (·) represents the arbiter operation;
finally, the objective function of the arbiter is as follows:
where θ d represents the network parameters of the arbiter.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (3)

1. A fuzzy Transformer hash method for medical image retrieval target attacks, comprising the steps of:
S10: firstly, a medical image database is established, then samples in the database are preprocessed and expanded, and finally a training set T r, a test set T e and a database sample set T d are divided;
S20: a fuzzy transform hash model is constructed, which consists of four parts: a visual transducer hash model, a prototype network, a residual fuzzy generator and a discriminator; the visual transducer hash model comprises two modules, namely a feature learning module and a hash code learning module, wherein the feature learning module is responsible for extracting depth features of medical images, and the hash code learning module is responsible for mapping the extracted depth features into hash codes; the prototype network is responsible for extracting semantic features f L of various image labels in the database and mapping the semantic features f L into prototype codes; the residual ambiguity generator firstly carries out deconvolution on semantic features F L extracted by a prototype network to form a feature map F L with the same size as an original image, and then adds the feature map F L with the original image q t to generate a contrast sample q t'; the discriminator is then responsible for discriminating whether its input is from the original image or the generated challenge sample;
S30: respectively calculating loss functions of all the parts according to the model constructed in the step S20: loss of visual transducer hash model L H, loss of prototype network L P, loss of residual blur generator L G, loss of arbiter L D;
s40: according to the loss function of each part of the model, sequentially optimizing a visual transducer hash model, a prototype network, a residual error fuzzy generator and a discriminator by using an alternative learning algorithm, and storing the optimized model;
S50: medical images in the test set T e are sequentially input into a visual transducer hash model, a prototype network and a residual error fuzzy generator to generate corresponding prototype codes and countermeasure samples, the prototype codes and the countermeasure samples are used as query samples to be retrieved from the database sample set T d, and finally target average precision T-MAP is used for evaluating target attack performance.
2. The fuzzy Transformer hash method for medical image retrieval target attack of claim 1, wherein the specific structure of the visual Transformer hash model, prototype network, residual fuzzy generator and discriminant is as follows:
S21: the visual transducer hash model is mainly responsible for extracting characteristics of an attacked object and mapping the characteristics into hash codes, wherein the visual transducer model is adopted in the characteristic extraction stage;
s211: the power mean transform layer is integrated into the input and output stages of the transform encoder to enhance the nonlinearity of the model; if the input and output of the power mean transform layer are x and l, respectively, the PMT transforms x into [ ln (x+β), ln 2 (x+β) ], where β is a constant; during the forward propagation process, the forward direction propagation time, Representing the gradient of PMT output (y= [ ln (x+β), ln 2 (x+β) ]), calculated using the chain law:
The power mean value conversion layer is integrated into the whole model through forward and backward propagation, and the integration enables the model to learn more complex information in the training process, so that the nonlinearity of the model is enhanced;
S212: after step S211, the spatial pyramid pooling layer module accepts the input feature map and generates feature subgraphs with sizes of 1×1, 2×2, 4×4, and 16×16, respectively, through the adaptive averaging pooling layer; the generated feature subgraphs are flattened and spliced, so that features with different scales can be conveniently extracted from the input feature graphs, and multi-scale features are generated, so that information from different areas is integrated;
Integrating the spatial pyramid pooling layer module into a multi-head attention mechanism to form a multi-head spatial pyramid pooling attention mechanism module; in a multi-head space pyramid pooling attention mechanism module, firstly, carrying out multi-scale feature extraction and fusion on a value V and a key K through a space pyramid pooling layer module; next, the scale click attention layer processes query Q and pooled key value pairs K 'and V'; this layer calculates the attention score between query Q and key-value pairs K 'and V';
S213: step S212 is carried out to obtain the output of the multi-head space pyramid pooling attention mechanism module and the output of a transducer encoder; if the dimensions of query Q and key K' are d k, the output of the multi-headed spatial pyramid pooling attention mechanism module is expressed as:
Wherein K 'is a value obtained after the spatial pyramid pooling operation is performed on K, and V' is a value obtained after the spatial pyramid pooling operation is performed on V; MHSPA (·, ·, ·) represents a multi-headed spatial pyramid pooling attention operation; QK' T calculates the similarity between each query and all keys; normalizing the similarity score by a soft max (·) function to obtain the attention weight of each query, wherein the sum of the weights is 1; scaling factor The influence degree of dot product is reduced; the scaling dot product attention layer not only enhances the capability of the model to process long-range dependency in input, but also relieves the problem of unstable numerical value caused by large dot product magnitude;
If the input to the transducer encoder is Z l, l=1, 2,..l, then output Z o is expressed as:
Zo=MLP(LN(MHSPA(LN(zl))+zl))+MHSPA(LN(zl))+zl
Wherein, MLP (& gt) represents the operation of the multi-layer perceptron, and LN (& gt) represents the normalization operation of the layers;
S214: after obtaining the output of the transducer encoder according to step S213, generating features through a power mean value conversion layer, and classifying and generating hash codes through three completely connected layers;
S22: the prototype network is mainly responsible for extracting semantic features of labels and generating corresponding prototype codes, and the input of the prototype network is a unique label set of various images in a training set T r N u represents the number of unique tags, c represents the number of categories; firstly, extracting semantic features f L from a tag set L u through two full connection layers; secondly, the semantic feature f L passes through the full connection layer again to obtain classification information and corresponding prototype codes; finally, the semantic feature f L is used as part of the input assist challenge sample generation of the residual blur generator;
S23: the input of the residual ambiguity generator comprises two parts, one part is a feature F L which is the same as the original image in size and is formed by deconvolution and lamination of 5 layers of semantic features F L extracted from a prototype network, and the other part is an original image q t; taking (q t+FL) as a new input, forming a feature F' L with the same size as the original image through a 3-layer convolution layer, 1 residual blurring block and a 2-layer deconvolution layer; then (q t+FL ') is passed through 1 convolution layer to form a challenge sample q t' with the same size as the original image;
S231: the residual blurring block in the step S23 is an important component part in a residual blurring generator and is formed by stacking 6 identical residual blurring units; each residual error blurring unit firstly passes through 2 layers of convolution layers and then is connected with the input feature map in a jumping manner to obtain a feature map F R; then F R continuously passes through the fuzzy layer Fl (-) and is connected with the feature map F R in a jumping way to obtain a feature map F R+Fl(FR); finally, F R+Fl(FR) is used as a new input to sequentially pass through the residual error blurring units to obtain a blurred feature map F' R; wherein the input of the blur layer is H R、WR and C R represent the height, width, and number of channels of the input feature map, respectively;
S232: converting the feature map of each channel obtained in the step S231 into a one-dimensional feature vector, and subjecting all feature points in the feature vector of each channel to M times of fuzzy membership function calculation; each calculation of the fuzzy membership function assigns a fuzzy label to the feature points, namely:
Wherein i represents each feature point in the channel, mu k,c and sigma k,c respectively represent the mean value and standard deviation of the c-th channel in the calculation of the k-th Gaussian blur membership function, and Z i,k,c represents the blur label of the k-th Gaussian blur membership function of the i-th feature point in the c-th channel;
AND then merging fuzzy labels calculated by M Gaussian fuzzy membership functions through AND fuzzy logic operation, namely:
Wherein Z i,c represents ambiguity; then, the ambiguity Z i,c is subjected to shape transformation to obtain an ambiguous feature map
S24: the discriminator takes as input the original image q t and the challenge sample q' t, passes through the 5 convolutional layers and the 3 fully connected layers in sequence, and predicts the label of the image in the last fully connected layer and determines whether the input is from the original image or the challenge sample.
3. The fuzzy Transformer hash method for medical image retrieval objective attack according to claim 1, wherein the loss of visual Transformer hash model L H, the loss of prototype network L P, the loss of residual fuzzy generator L G and the loss of arbiter L D are specifically as follows:
s31: the loss L H of the visual transducer hash model is calculated, and is specifically divided into a pairwise loss L HP, a quantization loss L HQ, a balance loss L HB and a classification loss L HC, and the objective functions of the visual transducer hash model are as follows:
s.t.V∈{-1,+1}n×k
Wherein α h、βh and γ h are insensitive hyper-parameters; n is the number of database samples, k is the number of hash codes, and V represents the hash codes of the database samples;
S311: first, the pair loss L HP is calculated:
s.t.U=[u1,u2,...,um]T∈{-1,+1}m×k,V=[v1,v2,...,vn]T∈{-1,+1}n×k
Wherein U i and V j represent hash codes of the t training sample and the j database sample, respectively, U and V represent hash codes of the training sample and the database sample, S represents a similarity matrix between the query set and the database, U t=h(qt)=sign(Φ(qth)), h (·) represents a hash function, q t represents an original image, Φ (q th)∈Rk represents an output of Ha Xiceng, k represents a number of bits of the hash codes, m is a number of training samples, θ h represents a network parameter of the full connection layer before Ha Xiceng;
Considering that the solution to the sign (·) of the sign in L HP is a discrete optimization problem, the sign function is approximated using the hyperbolic tangent function tanh (·), namely:
S312: since the hyperbolic tangent function tanh (·) is used to approximate the sign (·) function in the pairwise penalty L HP, a regularization term is added between the real value output and the hash code, namely the quantization penalty L HQ:
S313: in order to make the hash code fill the entire hamming space of 2 k as much as possible and to guarantee the balance of each bit, a balance loss L HB is proposed to ensure that the probability of-1 and +1 occurrence on each bit is as equal as possible, defined as follows:
Wherein the mean (·) function is used to calculate the average of the elements in the vector;
s314: and because the number of labels of different images is greatly different, different loss functions are used for the single-label image and the multi-label image in the training process:
(1) Multi-tag classification penalty L multi:
(2) Single tag classification penalty L single:
Where pred (q tc) represents the predictive label of image q t, l t represents the true label of image q t, and θ c represents the network parameters of the fully connected layer prior to the classification layer;
Therefore, the classification penalty of the visual transducer hash model L HC=Lmulti+Lsingle;
S32: the loss L P of the prototype network is calculated, and is specifically divided into a pair likelihood loss L PP, a quantization loss L PQ and a classification loss L PC, and the objective function of the prototype network is as follows:
Where α p、αp and γ p are insensitive super parameters, θ p represents the network parameters of the prototype network;
s321: first, the pair likelihood loss L PP:
Wherein, Similarity matrix between the tag set L u representing various images in the database and the tag set L Q of the query set, n u is the number of prototype codes,/>U epsilon-1, +1} m×k,Vp is a prototype code;
s322: a regularization term, i.e. quantization loss, is introduced between the real value output f p of the prototype network and the prototype code V p:
LPQ=||Vp-fp||2
=||sign(fp)-fp||2
S323: similar to the visual transducer hash model, the classification loss of the prototype network combines multi-label classification loss and single-label classification loss, taking into account the variation in the number of labels on different images:
(1) Multi-tag classification penalty L mul-p:
(2) Single tag classification penalty L sin-p:
Wherein, Representing predicted labels, L u representing labels of various types of images;
Therefore, the classification loss of the prototype network L PC=Lmul-p+Lsin-p;
S33: the loss L G of the residual blurring generator is calculated, and the method is specifically divided into a reconstruction loss L GR and a hamming loss L GH:
where q t represents the original image and q' t represents the corresponding challenge sample;
Wherein V p represents the prototype code, h (·) represents the hash function, h (q 't) represents the hash code against the sample q t', dist H (·) represents the hamming distance function, and k represents the number of bits of the hash code;
finally, the objective function of the residual blur generator is as follows:
Where α g and β g are insensitive super parameters, θ g represents the network parameters of the residual blur generator;
S34: calculating loss L D of the discriminator:
Wherein, D (·) represents the arbiter operation;
finally, the objective function of the arbiter is as follows:
where θ d represents the network parameters of the arbiter.
CN202410234959.5A 2024-03-01 2024-03-01 Fuzzy Transformer hash method for medical image retrieval target attack Pending CN118093911A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410234959.5A CN118093911A (en) 2024-03-01 2024-03-01 Fuzzy Transformer hash method for medical image retrieval target attack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410234959.5A CN118093911A (en) 2024-03-01 2024-03-01 Fuzzy Transformer hash method for medical image retrieval target attack

Publications (1)

Publication Number Publication Date
CN118093911A true CN118093911A (en) 2024-05-28

Family

ID=91162776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410234959.5A Pending CN118093911A (en) 2024-03-01 2024-03-01 Fuzzy Transformer hash method for medical image retrieval target attack

Country Status (1)

Country Link
CN (1) CN118093911A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118396985A (en) * 2024-06-25 2024-07-26 吉林大学 Robust segmentation method and system for breast ultrasound image aiming at attack resistance

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204522A (en) * 2021-07-05 2021-08-03 中国海洋大学 Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network
WO2022037295A1 (en) * 2020-08-20 2022-02-24 鹏城实验室 Targeted attack method for deep hash retrieval and terminal device
CN116128846A (en) * 2023-02-01 2023-05-16 南通大学 Visual transducer hash method for lung X-ray image retrieval

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022037295A1 (en) * 2020-08-20 2022-02-24 鹏城实验室 Targeted attack method for deep hash retrieval and terminal device
CN113204522A (en) * 2021-07-05 2021-08-03 中国海洋大学 Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network
CN116128846A (en) * 2023-02-01 2023-05-16 南通大学 Visual transducer hash method for lung X-ray image retrieval

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEIPING DING等: "ViTH-RFG:Vision Transformer Hashing with Residual Fuzzy Generation for Targeted Attack in Medical Image Retrieval", 《IEEE TRANSACTIONS ON FUZZY SYSTEMS ( EARLY ACCESS )》, 14 December 2023 (2023-12-14), pages 1 - 15 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118396985A (en) * 2024-06-25 2024-07-26 吉林大学 Robust segmentation method and system for breast ultrasound image aiming at attack resistance
CN118396985B (en) * 2024-06-25 2024-09-03 吉林大学 Robust segmentation method and system for breast ultrasound image aiming at attack resistance

Similar Documents

Publication Publication Date Title
CN107122809B (en) Neural network feature learning method based on image self-coding
Cui et al. Identifying materials of photographic images and photorealistic computer generated graphics based on deep CNNs.
CN112100346B (en) Visual question-answering method based on fusion of fine-grained image features and external knowledge
CN113657450B (en) Attention mechanism-based land battlefield image-text cross-modal retrieval method and system
CN118093911A (en) Fuzzy Transformer hash method for medical image retrieval target attack
CN110598022B (en) Image retrieval system and method based on robust deep hash network
CN115222998B (en) Image classification method
CN114373224B (en) Fuzzy 3D skeleton action recognition method and device based on self-supervision learning
Liu et al. Image retrieval using CNN and low-level feature fusion for crime scene investigation image database
Chen Image recognition technology based on neural network
CN114973226A (en) Training method for text recognition system in natural scene of self-supervision contrast learning
CN118628736A (en) Weak supervision indoor point cloud semantic segmentation method, device and medium based on clustering thought
CN117671666A (en) Target identification method based on self-adaptive graph convolution neural network
Al-Qaisi et al. A Hybrid Method of Face Feature Extraction, Classification Based on MLBP and Layered-Recurrent Network.
Yildirim et al. REGP: A NEW POOLING ALGORITHM FOR DEEP CONVOLUTIONAL NEURAL NETWORKS.
Ding et al. ViTH-RFG: Vision Transformer Hashing with Residual Fuzzy Generation for Targeted Attack in Medical Image Retrieval
CN115100107B (en) Method and system for dividing skin mirror image
Goundar Improved deep learning model based on integrated convolutional neural networks and transfer learning for shoeprint image classification
Ouni et al. An efficient ir approach based semantic segmentation
Xue et al. Fast and unsupervised neural architecture evolution for visual representation learning
Liang et al. Fast saliency prediction based on multi-channels activation optimization
Gopalakrishnan Vector Spaces for Multiple Modal Embeddings
Grycuk et al. Solar image hashing by intermediate descriptor and autoencoder
Somnathe et al. Image retrieval based on colour, texture and shape feature similarity score fusion using genetic algorithm
Boukerma et al. Significant Feature Dimensionality Reduction for Histopathology Image Retrieval as a Tool for Healthcare Decisions Support

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination