CN111401156B - Image identification method based on Gabor convolution neural network - Google Patents

Image identification method based on Gabor convolution neural network Download PDF

Info

Publication number
CN111401156B
CN111401156B CN202010134463.2A CN202010134463A CN111401156B CN 111401156 B CN111401156 B CN 111401156B CN 202010134463 A CN202010134463 A CN 202010134463A CN 111401156 B CN111401156 B CN 111401156B
Authority
CN
China
Prior art keywords
gabor
neural network
image
convolutional neural
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010134463.2A
Other languages
Chinese (zh)
Other versions
CN111401156A (en
Inventor
达飞鹏
庄磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010134463.2A priority Critical patent/CN111401156B/en
Publication of CN111401156A publication Critical patent/CN111401156A/en
Application granted granted Critical
Publication of CN111401156B publication Critical patent/CN111401156B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image identification method based on a Gabor convolutional neural network, which comprises the following processing steps: (1) Selecting Gabor wavelets with different parameters to construct a Gabor characteristic extraction module; (2) Building a parallel convolution module through the convolution layer shared by the weight; (3) Operating and designing a spatial transformation pooling module by taking the maximum value through element-wise; (4) Constructing a Gabor convolution layer according to the Gabor feature extraction module, the parallel convolution module and the spatial transformation pooling module; (5) Selecting a reference network for constructing a Gabor convolutional neural network, and determining a scheme for replacing a common convolutional layer in the reference network by the Gabor convolutional layer; (6) And training the Gabor convolutional neural network by utilizing an SGD algorithm with momentum and carrying out image recognition. The recognition method provided by the invention has low algorithm complexity, higher robustness for space transformation, and certain improvement on recognition accuracy and speed.

Description

Image identification method based on Gabor convolution neural network
Technical Field
The invention belongs to the technical field of image recognition, particularly relates to an image recognition method based on a Gabor convolutional neural network, particularly relates to an image recognition method combining knowledge in the field of Gabor traditional image processing and deep learning parameter learning, and is particularly suitable for occasions of space transformation such as large rotation, scale transformation and the like.
Background
The image recognition technology is an important field of artificial intelligence, and is more and more emphasized, the recognition technology mainly comprises optical character recognition, face recognition, vehicle recognition, biomedical image recognition and the like, and the method has important application value in many fields of public security criminal investigation, natural resource analysis, weather forecast, environment monitoring, physiological lesion research and the like. The traditional pattern recognition method Gabor wavelet has been widely applied to the field of image processing; in recent years, convolutional neural networks have greatly promoted the development of image recognition techniques in the field of computer vision, such as character recognition and face recognition. Therefore, the research on image identification of the combination of the Gabor wavelet and the convolutional neural network is a work which has great social application value and theoretical innovation.
The deep convolutional neural network can extract expressive features through network learning, however, since the classical convolutional neural network lacks a module specifically used for processing rotation and scale transformation, it is difficult to learn the features with high robustness to spatial transformation through the network, and the application of the features in actual scenes with angular rotation, scale transformation and the like is limited. The method for improving the network robustness has high calculation cost, a shallow network is constructed by using less direction information, and the problems that depth features with expressive force are difficult to extract, the extracted features are not robust enough to rotate, an efficient deep network is difficult to construct and the like exist.
Disclosure of Invention
The technical problem is as follows: in order to overcome the defects in the prior art, the invention provides an image identification method based on a Gabor convolutional neural network (a neural network constructed based on a Gabor convolutional layer), which can fully extract the depth characteristics of an image without improving the complexity of network operation, and has higher identification precision and robustness to spatial transformation.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
an image identification method based on a Gabor convolutional neural network is characterized in that the Gabor convolutional neural network is obtained by replacing at least one convolutional layer in a convolutional neural network CNN with a Gabor convolutional layer, and the Gabor convolutional layer is composed of a Gabor feature extraction module, a parallel convolutional module and a spatial transform pooling module which are connected in sequence; the Gabor characteristic extraction module consists of M Gabor wavelet extraction modules with different direction and scale parameters, the parallel convolution module consists of M convolution layers shared by weights, and the spatial transformation pooling module performs element-wise maximization on the output of the parallel convolution module;
the image recognition method comprises the following steps:
step 1: preprocessing a sample image in a sample set, wherein the preprocessing comprises image graying and image space transformation;
step 2: training the Gabor convolutional neural network through the sample set preprocessed in the step 1, wherein a cost function of the Gabor convolutional neural network is minimized by using a SGD algorithm with momentum:
Figure RE-GDA0002484003440000021
wherein: j (theta) is a cost function; theta is a parameter of the Gabor convolution neural network; m is the number of samples; x is the number of (i) Represents the ith sample; h is θ (x (i) ) A prediction class label representing an ith sample; y is (i) A category label for the ith sample;
and step 3: after the image to be recognized is preprocessed in the step 1, the preprocessed image is input into the Gabor convolutional neural network trained in the step 2, and a recognition result of the image to be recognized is obtained.
Further, gabor wavelets with the same direction and different scales are selected to construct a Gabor convolution layer which is robust to scale conversion, or Gabor wavelets with different directions and different scales are selected to construct a Gabor convolution layer which is robust to space conversion.
Further, the step 3 adopts a cross validation method to select an optimal Gabor convolutional neural network, which specifically comprises the following steps: and randomly dividing the sample set into k parts, taking k-1 parts as a training set and 1 part as a verification set, sequentially rotating the training set and the verification set k times to respectively train and verify the Gabor convolutional neural network, and selecting the trained Gabor convolutional neural network with the minimum verification error as the optimal Gabor convolutional neural network and storing the optimal Gabor convolutional neural network.
Has the beneficial effects that: compared with the prior art, the invention has the following beneficial effects:
the invention provides an image identification method based on a Gabor convolutional neural network, which is characterized in that a Gabor convolutional layer is constructed by designing a Gabor characteristic extraction module, a parallel convolution module and a spatial transform pooling module, and compared with a common convolutional layer, the Gabor convolutional layer firstly extracts multi-direction and multi-scale shallow layer characteristics of an image through Gabor wavelets with different parameters and different characteristics, then learns the deep layer characteristics of the image through the parallel convolution module, and finally obtains the characteristics of spatial transform robustness through spatial change pooling operation. And replacing the common convolutional layer with a Gabor convolutional layer on the basis of the reference convolutional neural network to construct a Gabor convolutional neural network and extract the robust features of the image. Compared with other image identification technologies based on the neural network, the method has the following advantages:
1) The method utilizes the complementarity between the shallow feature and the deep feature, and effectively improves the precision and the robustness of the algorithm;
2) The traditional image processing experience information of the Gabor wavelet is introduced, so that the shallow layer characteristics of the image are effectively extracted, a foundation is laid for the subsequent efficient network learning, and the training efficiency of the network is improved;
3) Constructing a Gabor convolutional layer by using Gabor wavelets, a parallel convolutional module and a spatial pooling module, further replacing the corresponding convolutional layer, and improving the feature extraction capability of the network;
4) The Gabor convolutional neural network is constructed through a Gabor convolutional layer, a pooling layer, a full-connection layer and the like, and the robustness of the algorithm to space transformation such as rotation, scale transformation, translation and the like is effectively improved.
Drawings
FIG. 1 is an overall flow chart of an image recognition method based on a Gabor convolutional neural network provided by the invention;
FIG. 2 is a Gabor wavelet with different selected parameters; wherein (a) is Gabor wavelet with 0 dimension and 0 direction, and (b) is Gabor wavelet with 0 dimension and 0 direction
Figure RE-GDA0002484003440000031
The Gabor wavelet (c) is the Gabor wavelet with the scale of 2 and the direction of 0, and the Gabor wavelet (d) is the Gabor wavelet with the scale of 2 and the direction of 0
Figure RE-GDA0002484003440000032
Gabor wavelet of (1);
FIG. 3 is a graph of the effect of processing an image through a Gabor kernel with different parameters; wherein, (a) is the result of the face image after the Gabor wavelet processing with the scale of 0 and the direction of 0, and (b) is the result of the face image after the scale of 0 and the direction of 0
Figure RE-GDA0002484003440000033
The result of the Gabor wavelet processing of (c) is that the face image is processed by the Gabor wavelet with the scale of 2 and the direction of 0As a result, (d) is a face image passing through a scale of 2 and a direction of
Figure RE-GDA0002484003440000034
The result of the Gabor wavelet processing;
FIG. 4 is a structural view of a constructed Gabor convolution layer;
FIG. 5 is a diagram of a constructed Gabor convolutional neural network, in which (a) is a reference four-layer convolutional neural network structure and (b) is a Gabor convolutional neural network structure;
fig. 6 is a diagram showing the processing effect of the MNIST character set, where (a) is the MNIST data set original, and (b) is the processing effect of the Gabor convolutional neural network, and (c) is the processing effect of the general convolutional neural network.
Detailed Description
The invention relates to an image recognition method based on a Gabor convolutional neural network, which carries out image recognition through the trained Gabor convolutional neural network. The Gabor convolutional neural network is obtained by replacing at least one convolutional layer in the convolutional neural network CNN with a Gabor convolutional layer.
The Gabor convolution layer is composed of a Gabor feature extraction module, a parallel convolution module and a space transformation pooling module which are connected in sequence; the Gabor characteristic extraction module is composed of Gabor wavelet extraction modules with different direction and scale parameters, the parallel convolution module is composed of convolution layers shared by M weights, and the spatial transformation pooling module is used for carrying out element-wise maximum value extraction on the output of the parallel convolution module.
The image recognition method comprises the following steps:
step 1: preprocessing a sample image in a sample set, wherein the preprocessing comprises image graying and image space transformation;
step 2: training the Gabor convolutional neural network through the sample set preprocessed in the step 1, wherein a cost function of the Gabor convolutional neural network is minimized by using a SGD algorithm with momentum:
Figure RE-GDA0002484003440000041
wherein: j (theta) is a cost function; theta is a parameter of the Gabor convolution neural network; m is the number of samples; x is the number of (i) Represents the ith sample; h is θ (x (i) ) A prediction class label representing an ith sample; y is (i) A category label for the ith sample;
and step 3: after the image to be recognized is preprocessed in the step 1, the preprocessed image is input into the Gabor convolutional neural network trained in the step 2, and a recognition result of the image to be recognized is obtained.
The technical solution of the present invention is further illustrated by the following specific examples:
the first step, the Gabor convolution layer construction phase, includes:
step 1: gabor wavelets with different parameters are selected to construct a Gabor feature extraction module, gabor features in different directions and different scales of the image can be extracted by selecting Gabor wavelets with different parameters in M directions and scales, and the method specifically comprises the following steps:
M=U×V
wherein: u is the number of the directions of the Gabor wavelet; v is the scale number of Gabor wavelets;
step 2: building a parallel convolution module through the convolution layers shared by the M weights, and further learning the shallow layer features extracted by the Gabor feature extraction module;
and step 3: designing a spatial transformation pooling module by performing element-wise maximum value operation on the output of the parallel convolution module, and reducing characteristic dimensions;
and 4, step 4: and (4) sequentially connecting the Gabor feature extraction module, the parallel convolution module and the spatial transformation pooling module designed in the step (1-3), and constructing a corresponding Gabor convolution layer according to the robustness requirements of different identification tasks on spatial transformation.
And secondly, constructing a Gabor convolution neural network, comprising the following steps of:
and 5: according to the difficulty degree of the recognition task, correspondingly selecting a reference network for building a Gabor convolutional neural network;
step 6: determining a scheme for replacing a common convolutional layer in a reference network by a Gabor convolutional layer;
step three, an off-line training stage comprises:
and 7: preprocessing a two-dimensional image, including image graying and image space transformation;
and 8: after all samples in the sample set are processed in the step 7, extracting the depth features of the images through a Gabor convolutional neural network constructed in the step 5-6, and extracting by utilizing a full connection layer to obtain feature vectors;
and step 9: the method for training the Gabor convolutional neural network by utilizing the minimum cost function of the SGD algorithm with momentum specifically comprises the following steps:
Figure RE-GDA0002484003440000042
wherein: j (theta) is a cost function; theta is a parameter of the Gabor convolution neural network; m is the number of samples; x is the number of (i) Represents the ith sample; h is θ (x (i) ) A prediction class label representing the ith sample; y is (i) Is the category label of the ith sample.
Step four, an online testing stage:
step 10: after all samples in the test set are processed in the step 7, inputting the samples into a trained model in an off-line training stage to obtain the category of the image to be tested;
further, the step 1 specifically includes the following steps:
step 1.1: selecting M Gabor wavelets with different parameters to construct a Gabor feature extraction module, which specifically comprises the following steps:
Figure RE-GDA0002484003440000051
Figure RE-GDA0002484003440000052
G={g μ , ν (z):μ∈{0,1,...,U-1},ν∈{0,1,...,V-1}}
wherein: z = (x, y) spatial location coordinates; parameter σ =2 π; | | · | | represents a norm operator;
Figure RE-GDA0002484003440000053
k max in order to be the maximum frequency of the frequency,
Figure RE-GDA0002484003440000054
mu and nu respectively represent the direction and the scale of the Gabor kernel, mu belongs to {0, 1., U-1}, nu belongs to {0, 1., V-1}, and G is the Gabor kernel with different selected parameters; u and V are the directions and the scale numbers of the Gabor cores respectively;
step 1.2: gabor feature F with different directions and different scales of image or feature mapping is extracted by using a Gabor feature extraction module gab The method specifically comprises the following steps:
O μ,ν (z)=I(z)*g μ,ν (z)
Figure RE-GDA0002484003440000055
wherein: i (z) is an image or feature map; o is μ,ν (z) is the convolution result of the image and the Gabor kernel in the mu direction and the v scale; denotes convolution; g is selected Gabor core with M different parameters;
Figure RE-GDA0002484003440000056
a jth channel representing the input profile Fin;
Figure RE-GDA0002484003440000057
representing the feature F after Gabor kernel convolution gab The jth channel of (1); output F of Gabor feature extraction module gab Can be seen as an input characteristic diagram F in M Gabor features.
Further, the step 2 specifically includes the following steps:
step 2.1: selecting M convolutional layers with shared weights to construct a parallel convolutional module, so as to ensure that each Gabor feature can be further learned to obtain more expressive deep features;
step 2.2: further learning the extracted Gabor characteristics by adopting a parallel convolution module shared by weight values to obtain an image depth characteristic F c o nv The method specifically comprises the following steps:
Figure RE-GDA0002484003440000058
wherein:
Figure RE-GDA0002484003440000061
represents the jth Gabor feature; c represents a learnable convolution kernel (M convolution layers share weight values);
Figure RE-GDA0002484003440000062
is the output characteristic diagram of the jth convolution layer;
step 2.3: the output of the parallel convolution module is the output of the M convolution layers, which specifically includes:
Figure RE-GDA0002484003440000063
wherein: f conv Representing the output signature of the convolution module.
Further, the step 3 specifically includes the following steps:
step 3.1: taking the maximum value through element-wise to perform spatial transformation pooling operation, and constructing a spatial transformation pooling module;
step 3.2: inputting the output characteristics of the parallel convolution module into a space transformation pooling module, and fusing to obtain robust characteristics F out The method specifically comprises the following steps:
F out =max(F conv )
wherein: f out And the output characteristic diagram of the spatial transformation pooling module, namely the output characteristic diagram of the whole Gabor convolution layer is shown.
Further, the step 4 specifically includes the following steps:
step 4.1: selecting Gabor wavelets with different parameters to construct a Gabor feature extraction module according to the robustness requirements of different identification tasks on space transformation;
step 4.2: and sequentially connecting the Gabor characteristic extraction module, the parallel convolution module and the spatial transformation pooling module to construct corresponding Gabor convolution layers, namely selecting Gabor wavelets with the same direction and different scales to construct a Gabor convolution layer robust to scale transformation, and selecting Gabor wavelets with different directions and different scales to construct a Gabor convolution layer robust to spatial transformation.
Further, the step 5 specifically includes the following steps:
step 5.1: selecting a reference network structure for building a Gabor convolutional neural network according to the difficulty degree of an identification task, selecting an AlexNet network structure for a simple identification task, selecting a ResNet network structure for a more complex identification task, and selecting a LightCNN network structure for a face identification task;
step 5.2: and correspondingly adjusting the number of convolution layers, pooling layers and full-connection layers and adjusting the size of a convolution kernel on the basis of the reference network structure.
Further, the step 6 specifically includes the following steps:
step 6.1: determining a replacement scheme of the Gabor convolutional layer, namely replacing only the first convolutional layer of the network, replacing the first two convolutional layers and replacing all convolutional layers of the network;
step 6.2: and adjusting the number of convolution kernels of the Gabor convolution layer, and reducing the parameter quantity of the network model to improve the subsequent network training efficiency.
Further, the step 7 specifically includes the following steps:
step 7.1: carrying out image graying to eliminate redundant information and converting a color image into a grayscale image, which specifically comprises the following steps:
Gray=0.299*R+0.578*G+0.114*B
wherein: r is an image red channel component; g is the image green channel component; b is the image blue channel component; gray is an image Gray level image;
and 7.2: carrying out spatial transformation on the gray level image, wherein the spatial transformation comprises image translation, rotation and scale change, and the spatial transformation specifically comprises the following steps:
Figure RE-GDA0002484003440000071
wherein: (x, y, 1) represents the matrix before image translation; (x ', y', 1) represents the matrix after image translation; d is a radical of x And d y Respectively the translation pixel quantity of the image on the x axis and the y axis;
Figure RE-GDA0002484003440000072
wherein: (x, y, 1) represents a matrix before image scaling; (x ", y", 1) represents the matrix after image scaling; s is x And s y Scale transformation factors of the image on an x axis and a y axis respectively;
Figure RE-GDA0002484003440000073
wherein: (x, y, 1) represents a matrix before image rotation; (x "', y"', 1) represents the matrix after image rotation;
Figure RE-GDA0002484003440000074
is the angle of rotation.
Further, the specific method of step 8 is as follows: and (4) preprocessing all samples in the training set in the step (7), and inputting the samples into a Gabor convolutional neural network built by a Gabor convolutional layer, a common convolutional layer, a pooling layer, a BN layer, a nonlinear layer, a full-link layer and the like for training.
Further, the step 9 specifically includes the following steps:
step 9.1: selecting an SGD optimization algorithm with momentum to iterate a designed network structure, and gradually adjusting the learning rate in the training process;
step 9.2: selecting an optimal model by adopting cross validation, which specifically comprises the following steps: and randomly dividing the sample set into k parts, taking k-1 parts as a training set and 1 part as a verification set, sequentially rotating the training set and the verification set k times, selecting a model with the minimum verification error as an optimal model and storing the optimal model.
Further, the step 10 specifically includes the following steps:
step 10.1: carrying out image gray processing and random spatial transformation on the samples in the test set in the step 7;
step 10.2: and inputting the processed test set picture into the trained model in the off-line training stage to obtain the category of the image to be tested.
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following detailed description is only illustrative of the present invention and is not intended to limit the scope of the present invention, but only represents selected examples of the present invention.
Example 1:
the experimental platform is an Intel (R) Core (TM) i7-8700K processor and a 16GB memory, the display card is NVIDIA GeForce GTX1080Ti, and the deep learning frame is Pyorch. The experimental data were from the MNIST dataset of the national institute of standards and technology, and the training set contained 60000 handwritten numbers collected from 250 different people, 50% from high school students and 50% from the staff of the census. The test set is also handwritten digital data of the same scale.
In the experiment, 50000 samples are randomly selected from an MNIST data set as a training set, the remaining 10000 samples are selected as a verification set, and the average value of the experiment results on the test set for 5 times is taken as a final result. Gabor cores with the same dimension and different directions are selected to construct corresponding Gabor convolution layers. The identification rates corresponding to different Gabor nucleus direction numbers are slightly different, and the Rank-1 identification rate is shown in Table 1 and specifically comprises the following steps:
TABLE 1 Rank-1 identification Table on MNIST dataset
Number of directions n=4 n=8 n=12 n=16
Recognition rate 99.43 99.51 99.50 99.45
The implementation provides an image recognition method based on a Gabor convolutional neural network, the flow of which is shown in fig. 1, and the method specifically comprises the following steps:
the first step, the Gabor convolution layer construction phase, includes:
step 1: selecting M Gabor wavelets with different parameters to construct a Gabor feature extraction module, and respectively selecting Gabor wavelets with 4,8,12 and 16 directions to construct a Gabor feature extraction module G, which specifically comprises the following steps:
Figure RE-GDA0002484003440000081
Figure RE-GDA0002484003440000082
G={g μ,ν (z):μ∈{0,1,...,U-1},ν∈{0,1,...,V-1}}
wherein: z = (x, y) spatial location coordinates; parameter σ =2 π; | | · | | represents a norm operator;
Figure RE-GDA0002484003440000083
k max in order to be the maximum frequency of the frequency,
Figure RE-GDA0002484003440000084
mu and nu respectively represent the direction and the scale of the Gabor kernel, mu belongs to {0, 1., U-1}, nu belongs to {0, 1., V-1}, and G is the Gabor kernel with different selected parameters; u and V are the direction and the scale number of the Gabor kernel respectively; u ∈ {4,8,12,16}, V =1, m = U × V.
Step 2: and (3) building a parallel convolution module through convolution layers shared by M weights so as to ensure that each Gabor characteristic can be further learned to obtain a more expressive deep characteristic.
And step 3: and (4) operating and designing a spatial transformation pooling module by taking the minimum value through element-wise, and reducing the feature dimension.
And 4, step 4: sequentially connecting the Gabor feature extraction module, the parallel convolution module and the space transformation pooling module designed in the step 1-3 to construct a corresponding Gabor convolution layer, and specifically comprising the following steps:
step 4.1: and sequentially connecting the Gabor feature extraction module, the parallel convolution module and the space transformation pooling module to construct a Gabor convolution layer which is robust to rotation transformation.
Step 4.2: gabor feature extraction module in Gabor convolution layer extracts Gabor features F in different directions of image or feature mapping by using characteristics of Gabor wavelet gab The method specifically comprises the following steps:
O μ,ν (z)=I(z)*g μ,ν (z)
Figure RE-GDA0002484003440000091
wherein: i (z) is an image or feature map; o is μ,ν (z) is the convolution result of the image and the Gabor kernel in the mu direction and the v scale; denotes convolution; g is selected Gabor core with M different parameters;
Figure RE-GDA0002484003440000092
input feature representation F in The jth channel of (1);
Figure RE-GDA0002484003440000093
representing the feature F after Gabor kernel convolution gab The jth channel of (1); output F of Gabor feature extraction module gab Can be regarded as an input characteristic diagram F in M Gabor features. FIG. 2 is Gabor wavelets in different directions and different scales; fig. 3 is a diagram of the effect of Gabor wavelet processing with different parameters.
Step 4.3: the extracted Gabor characteristics are further learned by a parallel convolution module shared by the weights in the Gabor convolution layer to obtain an image depth characteristic F conv The method specifically comprises the following steps:
Figure RE-GDA0002484003440000094
wherein:
Figure RE-GDA0002484003440000095
represents the jth Gabor feature; c represents a learnable convolution kernel (M convolution layers share weight values);
Figure RE-GDA0002484003440000096
is the output characteristic diagram of the jth convolution layer;
the output of the parallel convolution module is the output of the M convolution layers, which specifically includes:
Figure RE-GDA0002484003440000097
wherein: f conv Representing the output signature of the parallel convolution module.
Step 4.4: the output of the parallel convolution module is fused by the spatial transformation pooling module in the Gabor convolution layer to obtain robust characteristics F out The method specifically comprises the following steps:
F out =max(F conv )
wherein: f out Representation spaceAnd transforming the output characteristic diagram of the pooling module, namely the output characteristic diagram of the whole Gabor convolution layer. Thus, the structure of the Gabor convolution layer is completed, and the structure of the Gabor convolution layer is shown in fig. 4.
And secondly, building a Gabor convolution neural network, wherein the building stage comprises the following steps:
and 5: according to the difficulty degree of the recognition task, correspondingly selecting a reference network for building a Gabor convolutional neural network, specifically:
step 5.1: according to the difficulty degree of the recognition task, selecting a reference network structure for building a Gabor convolutional neural network, selecting an AlexNet network structure for a simple recognition task, selecting a ResNet network structure for a more complex recognition task, and selecting a LightCNN network structure for a face recognition task. Here the AlexNet network architecture is chosen.
Step 5.2: the structure of the reference four-layer convolutional neural network is shown in (a) in fig. 5.
And 6: determining a scheme for replacing a common convolutional layer in a reference network by a Gabor convolutional layer;
step 6.1: the positions of the Gabor convolutional layers are determined, i.e. only the first convolutional layer, the first two convolutional layers and all convolutional layers of the network are replaced, where only the first convolutional layer of the network is replaced, considering that the Gabor wavelet is similar to the mammalian visual sense cells and similar to the first convolutional layer of the neural network.
Step 6.2: and (b) adjusting the number of convolution kernels of the Gabor convolution layer, and reducing the number of network model parameters to improve the subsequent network training efficiency, wherein the structure of the constructed Gabor convolution neural network is shown in (b) in fig. 5.
Step three, an off-line training stage comprises:
and 7: preprocessing a two-dimensional image, including image graying and image space transformation, specifically comprising:
step 7.1: carrying out image graying to eliminate redundant information and converting a color image into a grayscale image, specifically comprising the following steps:
Gray=0.299*R+0.578*G+0.114*B
wherein: r is an image red channel component; g is the image green channel component; b is the image blue channel component; gray is an image Gray level image;
step 7.2: carrying out spatial transformation on the gray level image, wherein the spatial transformation comprises image translation, rotation and scale change, and the spatial transformation specifically comprises the following steps:
Figure RE-GDA0002484003440000101
wherein: (x, y, 1) represents the matrix before image translation; (x ', y', 1) represents the matrix after image translation; d x And d y Respectively the translation pixel quantity of the image on the x axis and the y axis;
Figure RE-GDA0002484003440000111
wherein: (x, y, 1) represents a matrix before image scaling; (x ", y", 1) represents the matrix after image scaling; s x And s y Scale transformation factors of the image on an x axis and a y axis respectively;
Figure RE-GDA0002484003440000112
wherein: (x, y, 1) represents a matrix before image rotation; (x "', y"', 1) represents the matrix after image rotation;
Figure RE-GDA0002484003440000113
is the angle of rotation.
And 8: and (3) after all samples in the training set are processed in the step (7), extracting the depth features of the image through a Gabor convolutional neural network built in the steps (5-6) and composed of a Gabor convolutional layer, a common convolutional layer, a pooling layer, a BN layer, a nonlinear layer and the like, and extracting by utilizing a full-connection layer to obtain the feature vector.
And step 9: training a Gabor convolutional neural network by utilizing an SGD algorithm minimizing cost function with momentum, which specifically comprises the following steps:
step 9.1: and (3) selecting an SGD optimization algorithm with momentum to iterate the designed network structure, wherein the batch size is set to be 128, the dropout ratio of a full connection layer is set to be 0.5, the weight attenuation is set to be 0.00005, the initial learning rate is set to be 0.001, the learning rate is gradually adjusted in the training process, and the learning rate is reduced by one tenth of the previous learning rate in every 25 iterations. The loss function adopts cross entropy, which is specifically defined as:
Figure RE-GDA0002484003440000114
wherein: j (theta) is a cost function; theta is a parameter of the Gabor convolution neural network; m is the number of samples; x is a radical of a fluorine atom (i) Represents the ith sample; h is θ (x (i) ) A prediction class label representing the ith sample; y is (i) A category label for the ith sample;
step 9.2: selecting an optimal model by adopting cross validation, which specifically comprises the following steps: and randomly dividing the sample set into 5 parts, using 4 parts as a training set and 1 part as a verification set, sequentially rotating the training set and the verification set for 5 times, selecting the model with the minimum verification error as the optimal model and storing the optimal model.
Step four, an online testing stage:
step 10: after all samples in the test set are processed in the step 7, inputting the samples into a trained model in an off-line training stage to obtain the category of the image to be tested;
step 10.1: carrying out image gray processing and random spatial transformation on the samples in the test set in the step 7;
step 10.2: and inputting the processed test set picture into a trained model in an off-line training stage to obtain the category of the image to be tested.
In this embodiment, on the MNIST data set of the national institute of standards and technology in the united states, the image recognition method based on the Gabor convolutional neural network provided in this embodiment further extracts the deep robust features of the image by using the convolutional neural network while fully extracting the shallow features of the image by using the Gabor wavelet, so that the shallow features and the deep features are effectively combined, the robustness of the neural network to image space transformation is improved, and the accuracy, rapidity and robustness of the algorithm are considered, thereby meeting the application requirements in some fields.
The technical means disclosed in the scheme of the invention are not limited to the technical means disclosed in the above embodiments, but also include the technical means formed by any combination of the above technical features. It should be noted that modifications and adaptations can be made by those skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims (3)

1. An image identification method based on a Gabor convolution neural network is characterized in that: the Gabor convolutional neural network is obtained by replacing at least one convolutional layer in the convolutional neural network CNN with a Gabor convolutional layer, and the Gabor convolutional layer is composed of a Gabor feature extraction module, a parallel convolutional module and a spatial transform pooling module which are sequentially connected; the Gabor characteristic extraction module consists of M Gabor wavelet extraction modules with different direction and scale parameters, the parallel convolution module consists of M convolution layers with shared weights, and the spatial transformation pooling module performs element-wise maximization on the output of the parallel convolution module;
the image recognition method comprises the following steps:
step 1: preprocessing a sample image in a sample set, wherein the preprocessing comprises image graying and image space transformation;
step 2: training the Gabor convolutional neural network through the sample set preprocessed in the step 1, wherein a cost function of the Gabor convolutional neural network is minimized by using a SGD algorithm with momentum:
Figure FDA0002396837480000011
wherein: j (theta) is a cost function; theta is a parameter of the Gabor convolution neural network; m isIs the number of samples; x is the number of (i) Represents the ith sample; h is a total of θ (x (i) ) A prediction class label representing an ith sample; y is (i) A category label for the ith sample;
and step 3: after the image to be recognized is preprocessed in the step 1, the preprocessed image is input into the Gabor convolutional neural network trained in the step 2, and a recognition result of the image to be recognized is obtained.
2. The image identification method based on the Gabor convolutional neural network of claim 1, wherein Gabor wavelets with the same direction and different scales are selected to construct a Gabor convolutional layer robust to scale conversion, or Gabor wavelets with different directions and the same scales are selected to construct a Gabor convolutional layer robust to scale conversion, or Gabor wavelets with different directions and different scales are selected to construct a Gabor convolutional layer robust to space conversion.
3. The image recognition method based on the Gabor convolutional neural network of claim 1, wherein the optimal Gabor convolutional neural network is selected by adopting a cross validation method in the step 3, and specifically: and randomly dividing the sample set into k parts, taking k-1 parts as a training set and 1 part as a verification set, sequentially rotating the training set and the verification set k times to respectively train and verify the Gabor convolutional neural network, and selecting the trained Gabor convolutional neural network with the minimum verification error as the optimal Gabor convolutional neural network and storing the optimal Gabor convolutional neural network.
CN202010134463.2A 2020-03-02 2020-03-02 Image identification method based on Gabor convolution neural network Active CN111401156B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010134463.2A CN111401156B (en) 2020-03-02 2020-03-02 Image identification method based on Gabor convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010134463.2A CN111401156B (en) 2020-03-02 2020-03-02 Image identification method based on Gabor convolution neural network

Publications (2)

Publication Number Publication Date
CN111401156A CN111401156A (en) 2020-07-10
CN111401156B true CN111401156B (en) 2022-11-18

Family

ID=71428481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010134463.2A Active CN111401156B (en) 2020-03-02 2020-03-02 Image identification method based on Gabor convolution neural network

Country Status (1)

Country Link
CN (1) CN111401156B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101284A (en) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 Image recognition method, training method, device and system of image recognition model
CN112487909A (en) * 2020-11-24 2021-03-12 江苏科技大学 Fruit variety identification method based on parallel convolutional neural network
CN112633099B (en) * 2020-12-15 2023-06-20 中国人民解放军战略支援部队信息工程大学 Gaborne-based brain low-level vision zone signal processing method and system
CN114612987A (en) * 2022-03-17 2022-06-10 深圳集智数字科技有限公司 Expression recognition method and device
CN116562358B (en) * 2023-03-16 2024-01-09 中国人民解放军战略支援部队航天工程大学士官学校 Construction method of image processing Gabor kernel convolutional neural network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563999A (en) * 2017-09-05 2018-01-09 华中科技大学 A kind of chip defect recognition methods based on convolutional neural networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563999A (en) * 2017-09-05 2018-01-09 华中科技大学 A kind of chip defect recognition methods based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于Gabor核的卷积神经网络改进算法及应用;杨景明等;《燕山大学学报》;20180930(第05期);全文 *
基于局部特征的卷积神经网络模型;施恩等;《计算机工程》;20180215(第02期);全文 *

Also Published As

Publication number Publication date
CN111401156A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401156B (en) Image identification method based on Gabor convolution neural network
Thai et al. Image classification using support vector machine and artificial neural network
CN103605972B (en) Non-restricted environment face verification method based on block depth neural network
Goodfellow et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks
EP4002161A1 (en) Image retrieval method and apparatus, storage medium, and device
CN105138998B (en) Pedestrian based on the adaptive sub-space learning algorithm in visual angle recognition methods and system again
CN111985310A (en) Training method of deep convolutional neural network for face recognition
CN107491729B (en) Handwritten digit recognition method based on cosine similarity activated convolutional neural network
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN110110724A (en) The text authentication code recognition methods of function drive capsule neural network is squeezed based on exponential type
CN110136113B (en) Vagina pathology image classification method based on convolutional neural network
CN110188646B (en) Human ear identification method based on fusion of gradient direction histogram and local binary pattern
CN110414431B (en) Face recognition method and system based on elastic context relation loss function
Zhang Application of artificial intelligence recognition technology in digital image processing
Amorim et al. Analysing rotation-invariance of a log-polar transformation in convolutional neural networks
CN116823983A (en) One-to-many style handwriting picture generation method based on style collection mechanism
CN113177602B (en) Image classification method, device, electronic equipment and storage medium
CN109871835B (en) Face recognition method based on mutual exclusion regularization technology
CN108960275A (en) A kind of image-recognizing method and system based on depth Boltzmann machine
Pang et al. PTRSegNet: A Patch-to-Region Bottom-Up Pyramid Framework for the Semantic Segmentation of Large-Format Remote Sensing Images
CN114548197A (en) Clustering method based on self-discipline learning SDL model
Yuan et al. An efficient attention based image adversarial attack algorithm with differential evolution on realistic high-resolution image
Yangxiaoxiao Image modal analysis in art design and image recognition using AI techniques
CN117058437B (en) Flower classification method, system, equipment and medium based on knowledge distillation
CN110858280B (en) LGBPHS and CNN-based large-scale face recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant