CN106991440B - Image classification method of convolutional neural network based on spatial pyramid - Google Patents

Image classification method of convolutional neural network based on spatial pyramid Download PDF

Info

Publication number
CN106991440B
CN106991440B CN201710198700.XA CN201710198700A CN106991440B CN 106991440 B CN106991440 B CN 106991440B CN 201710198700 A CN201710198700 A CN 201710198700A CN 106991440 B CN106991440 B CN 106991440B
Authority
CN
China
Prior art keywords
layer
convolutional
neural network
pooling
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710198700.XA
Other languages
Chinese (zh)
Other versions
CN106991440A (en
Inventor
王改华
吕朦
李涛
袁国亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN201710198700.XA priority Critical patent/CN106991440B/en
Publication of CN106991440A publication Critical patent/CN106991440A/en
Application granted granted Critical
Publication of CN106991440B publication Critical patent/CN106991440B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image classification algorithm of a convolutional neural network based on a spatial pyramid, which is characterized in that global features are extracted by using the spatial pyramid, and then local features are acquired by each pyramid horizontal picture in a grid mode to form the overall features of the spatial pyramid. A new convolutional neural network model is constructed, the first half part of the model is a traditional convolutional network, and 3 convolutional layers and 2 pooling layers are provided; then, the 3 convolutional layers are uniformly pooled in a gridding mode to obtain respective characteristic maps. The feature maps of each layer are connected into a feature vector according to columns, and then the 3 feature vectors are sequentially connected into a total feature vector. The total feature vector covers the features of the classical convolutional layer, and the features of the former convolutional layer are added, so that the loss of important features is avoided, and meanwhile, the weight of each convolutional layer feature map is adjusted by the size of the grid, so that the identification efficiency of the network is improved.

Description

Image classification method of convolutional neural network based on spatial pyramid
Technical Field
The invention belongs to the technical field of image processing and pattern recognition, and particularly relates to an image recognition method of a deep convolutional neural network based on a spatial pyramid.
Background
The spatial pyramid first extracts global features of the original image, then divides the image into fine mesh sequences at each pyramid level, extracts features from each mesh at each pyramid level, and connects them into a large feature vector.
Convolutional neural networks have been widely used in recent years with unusual achievements in image processing. Subsequently, more researchers have modified the classical network. In order to obtain a better image recognition result, the patent uses the thinking of a space pyramid as a reference, provides a new deep convolution neural network, and obtains a better recognition effect compared with the traditional method.
Disclosure of Invention
The invention aims to provide an image classification mode of a deep convolutional neural network based on a space pyramid mode, and the image mode recognition capability is improved.
The technical scheme adopted by the invention is as follows: an image classification method based on a convolutional neural network of a spatial pyramid is characterized by comprising the following steps:
step 1: forward propagation, the specific implementation includes the following substeps:
step 1.1: establishing a convolution neural network with M convolution layers and M-1 pooling layers in the first half part;
step 1.2: pooling the M convolutional layers respectively to obtain M types of features, connecting the M types of features into a large feature vector respectively, and finally connecting the large feature vector into a total feature vector serving as a final feature of the image;
step 1.3: performing primary full connection and softmax classification on the final feature vector to obtain a convolutional neural network;
step 1.4: initializing all weights of the whole convolutional neural network through an empirical formula, inputting a training picture x into the initialized convolutional neural network, and propagating according to a forward propagation formula;
step 2: and (4) reverse regulation.
The invention has the beneficial effects that: a new convolutional neural network algorithm structure is provided, and the recognition efficiency is improved.
Drawings
FIG. 1: a method schematic of an embodiment of the invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
Referring to fig. 1, the image classification method of the convolutional neural network based on the spatial pyramid provided by the present invention includes the following steps:
step 1: forward propagation, the specific implementation includes the following substeps:
step 1.1: establishing a first half convolution neural network with 3 convolution layers and 2 pooling layers;
step 1.2: pooling the 3 convolutional layers respectively to obtain 3 types of features, then connecting the 3 types of features into a large feature vector respectively, and finally connecting the large feature vector into a total feature vector serving as a final feature of the image;
after the picture is input, obtaining a feature map of a first convolution layer through convolution kernel and hidden layer bias, wherein the convolution feature map x of the first layer1The formula is as follows;
wherein:a jth feature map of the 1 st convolutional layer,representing the input picture after preprocessing x0The ith picture of (1), n0Denotes x0The number of pictures;the jth two-dimensional convolution kernel representing layer 1,representing the bias of the jth characteristic diagram of the 1 st hidden layer; delta represents a sigmiod function, and mp represents the obtained characteristic diagram;
n1 is the number of convolution kernels in the first layer and is also the number of the 1 st convolution feature map;
the obtained feature map of the convolution layer is sampled by 2 x 2 uniform pooling to obtain a feature map v with half of the original rows and columns1
v1=mean-pooling{x1};
Wherein mean-pooling means uniform pooling
The characteristic diagram of each convolution layer can be obtained by the following formula;
the characteristic diagram of each pooling layer can be obtained by the following formula;
vl=mean-pooling{xl};
a total of 3 convolutional layers, i.e. x1,x2,x3Then, by using the method of spatial pyramid, the feature extraction is performed by performing a grid drawing on the feature maps of the 3 convolutional layers, in this embodiment, the 1 st convolutional layer is drawn into a 4 × 4 grid, then each grid is subjected to uniform pooling to extract a feature, and finally the 1 st convolutional layer becomes a 4 × 4 feature map p after feature extraction1
Dividing the 1 st convolutional layer into 4 x 4 grids, extracting a feature from each grid through uniform pooling, and finally obtaining a 4 x 4 feature map p after the 1 st convolutional layer is subjected to feature extraction1
p1=mean-pooling(v1);
Obtaining a class-3 feature graph p according to the following formula1,p2,p3
pl=mean-pooling(vl);
Wherein the pooling window size and step size change with a change in input picture size; p is a radical of1,p2,p3The sizes of the two groups of the Chinese character are respectively 4 × 4, 2 × 2 and 1 × 1 which are preset; then will beGrouped column by column into a column vector of size 16, p1To form a column vector of 16 x 6-96, and p can be combined in the same way2Aggregating into column vectors of size 2 x 16-64, p3The column vectors are aggregated into column vectors with the size of 1 × 120 ═ 120, and finally the column vectors are aggregated into a column vector with the total size of 280p is characteristic of the input picture.
Step 1.3: performing primary full connection and softmax classification on the final feature vector to obtain a convolutional neural network;
step 1.4: initializing all weights of the whole convolutional neural network through an empirical formula, inputting a training picture x into the initialized convolutional neural network, and propagating according to a forward propagation formula;
initializing all weights of the whole convolutional neural network through an empirical formula, and initializing weights w between a random generation input unit and a hidden layer unit according to the empirical formulakjAnd bias of hidden layer unit bj
Setting the initial value of b to be 0;
wherein w represents a weight, l represents the ith layer of the convolutional network, j represents the jth neuron of the ith convolutional layer of the convolutional neural network, k represents the kth layer of a fully-connected layer, layerinput represents the number of input neurons of the layer, and layeroutput represents the number of output neurons of the layer; k is a radical oflIs the size of the ith convolutional kernel, which can be initialized to a weight between-1 and 1.
Each input picture is represented as x, and the image input into the convolutional neural network is represented as x0(ii) a When the input picture is a gray picture, x0X; when the input picture is a color picture, graying by the following formula, x0=rgb2gray(x)。
Inputting a training image x and a label thereof, and calculating an output value of each layer by using the following forward conduction formula;
hw,b(x)=f(wTx+b);
wherein h is(w,b)(x) Representing the output value of the neuron, wTThe transpose of the weight values is represented,b denotes a bias and f denotes an activation function.
Step 2: reverse regulation; the specific implementation comprises the following substeps:
step 2.1: calculating the last layer deviation according to the label value and the last layer output value calculated by utilizing a forward conduction formula through the following formula;
wherein, JlIs a function of the loss of the l layers,for output layer neurons output values, hw,b(x(i)) Is the output value of the ith picture, y(i)A label representing the ith input picture.
Step 2.2: and calculating the deviation of each layer according to the deviation of the last layer so as to obtain the gradient direction, and updating the weight according to the following formula:
this embodiment adjusts convolutional layer x in the reverse adjustment process1,x2There are gradients coming from two directions, and the algorithm adjusts by adding the gradients in the two directions.
In the embodiment, a certain number of pictures are input into a trained convolutional neural network, a classification result is obtained through forward propagation, and the classification result is compared with a label carried by the user, and the same result is correct. Thereby obtaining the accuracy of the network algorithm.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. An image classification method based on a convolutional neural network of a spatial pyramid is characterized by comprising the following steps:
step 1: forward propagation, the specific implementation includes the following substeps:
step 1.1: establishing a convolution neural network with M convolution layers and M-1 pooling layers in the first half part;
step 1.2: pooling the M convolutional layers respectively to obtain M types of features, connecting the M types of features into a large feature vector respectively, and finally connecting the large feature vector into a total feature vector serving as a final feature of the image;
in step 1.2, if the convolutional neural network with 3 convolutional layers and 2 pooling layers is established in step 1.1, pooling the 3 convolutional layers respectively to obtain 3 types of characteristics; the specific implementation process of the step 1.2 is as follows:
after the picture is input, obtaining a feature map of a first convolution layer through convolution kernel and hidden layer bias, wherein the convolution feature map x of the first layer1The formula is as follows;
wherein:a jth feature map of the 1 st convolutional layer,representing the input picture after preprocessing x0The ith picture of (1), n0Denotes x0The number of pictures;the jth two-dimensional convolution kernel representing layer 1,representing the bias of the jth characteristic diagram of the 1 st hidden layer; delta represents a sigmiod function, and mp represents the obtained characteristic diagram;
n1 is the number of convolution kernels in the first layer and is also the number of the 1 st convolution feature map;
the obtained feature map of the convolution layer is sampled by 2 x 2 uniform pooling to obtain a feature map v with half of the original rows and columns1
v1=mean-pooling{x1};
Wherein mean-pooling means uniform pooling
The characteristic diagram of each convolution layer can be obtained by the following formula;
the characteristic diagram of each pooling layer can be obtained by the following formula;
vl=mean-pooling{xl};
a total of 3 convolutional layers, i.e. x1,x2,x3Then, carrying out feature extraction by means of drawing grids on the feature graphs of the 3 convolutional layers;
dividing the 1 st convolutional layer into 4 x 4 grids, extracting a feature from each grid through uniform pooling, and finally obtaining a 4 x 4 feature map p after the 1 st convolutional layer is subjected to feature extraction1
p1=mean-pooling(v1);
Obtaining a class-3 feature graph p according to the following formula1,p2,p3
pl=mean-pooling(vl);
Wherein the pooling window size and step size change with a change in input picture size; p is a radical of1,p2,p3The sizes of the two groups of the Chinese character are respectively 4 × 4, 2 × 2 and 1 × 1 which are preset; then will beGrouped column by column into a column vector of size 16, p1To form a column vector of 16 x 6-96, and p can be combined in the same way2Aggregating into column vectors of size 2 x 16-64, p3Aggregating the column vectors into column vectors with the size of 1 × 120 ═ 120, and finally aggregating the column vectors into a column vector p with the total size of 280 in sequence to serve as the characteristics of the input picture;
step 1.3: performing primary full connection and softmax classification on the final feature vector to obtain a convolutional neural network;
step 1.4: initializing all weights of the whole convolutional neural network through an empirical formula, inputting a training picture x into the initialized convolutional neural network, and propagating according to a forward propagation formula;
step 2: and (4) reverse regulation.
2. The method for classifying images based on the convolutional neural network of claim 1, wherein the step 1.4 initializes all weights of the whole convolutional neural network by an empirical formula, and initializes the weights w between the random generation input unit and the hidden layer unit according to the empirical formulakjAnd bias of hidden layer unit bj
Setting the initial value of b to be 0;
wherein w represents a weight, l represents the ith layer of the convolutional network, j represents the jth neuron of the ith convolutional layer of the convolutional neural network, k represents the kth layer of a fully-connected layer, layerinput represents the number of input neurons of the layer, and layeroutput represents the number of output neurons of the layer; k is a radical oflIs the size of the ith convolutional kernel, which can be initialized to a weight between-1 and 1.
3. The method of image classification based on the convolutional neural network of spatial pyramid as claimed in claim 1, wherein: step 1.4, inputting a training image x and a label thereof, and calculating an output value of each layer by using the following forward conduction formula;
hw,b(x)=f(wTx+b);
wherein h is(w,b)(x) Representing the output value of the neuron, wTRepresenting the transpose of the weights, b representing the bias, and f representing the activation function.
4. The method for image classification based on the convolutional neural network of the spatial pyramid as claimed in any one of claims 1 to 3, wherein: in step 1.4, each input picture is represented as x, and the image input into the convolutional neural network is represented as x0(ii) a When the input picture is a gray picture, x0X; when the input picture is a color picture, graying by the following formula, x0=rgb2gray(x)。
5. The method for classifying the image based on the convolutional neural network of the spatial pyramid as claimed in claim 3, wherein the step 2 is implemented by the following steps:
step 2.1: calculating the last layer deviation according to the label value and the last layer output value calculated by utilizing a forward conduction formula through the following formula;
wherein, JlIs a loss of one layerThe function of the loss is a function of the loss,for output layer neurons output values, hw,b(x(i)) Is the output value of the ith picture, y(i)A label representing the ith input picture;
step 2.2: and calculating the deviation of each layer according to the deviation of the last layer so as to obtain the gradient direction, and updating the weight according to the following formula:
6. the method of image classification based on the convolutional neural network of spatial pyramid as claimed in claim 5, wherein: adjusting convolutional layer x in reverse adjustment process1,x2There are gradients coming from two directions, and the algorithm adjusts by adding the gradients in the two directions.
CN201710198700.XA 2017-03-29 2017-03-29 Image classification method of convolutional neural network based on spatial pyramid Expired - Fee Related CN106991440B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710198700.XA CN106991440B (en) 2017-03-29 2017-03-29 Image classification method of convolutional neural network based on spatial pyramid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710198700.XA CN106991440B (en) 2017-03-29 2017-03-29 Image classification method of convolutional neural network based on spatial pyramid

Publications (2)

Publication Number Publication Date
CN106991440A CN106991440A (en) 2017-07-28
CN106991440B true CN106991440B (en) 2019-12-24

Family

ID=59412271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710198700.XA Expired - Fee Related CN106991440B (en) 2017-03-29 2017-03-29 Image classification method of convolutional neural network based on spatial pyramid

Country Status (1)

Country Link
CN (1) CN106991440B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389208B (en) * 2017-08-09 2021-08-31 上海寒武纪信息科技有限公司 Data quantization device and quantization method
CN107689036A (en) * 2017-09-01 2018-02-13 深圳市唯特视科技有限公司 A kind of Real-time image enhancement method based on the bilateral study of depth
CN107609638B (en) * 2017-10-12 2019-12-10 湖北工业大学 method for optimizing convolutional neural network based on linear encoder and interpolation sampling
CN107808139B (en) * 2017-11-01 2021-08-06 电子科技大学 Real-time monitoring threat analysis method and system based on deep learning
CN107862707A (en) * 2017-11-06 2018-03-30 深圳市唯特视科技有限公司 A kind of method for registering images based on Lucas card Nader's image alignment
CN108682029A (en) * 2018-03-22 2018-10-19 深圳飞马机器人科技有限公司 Multiple dimensioned dense Stereo Matching method and system
CN108596260A (en) * 2018-04-27 2018-09-28 安徽建筑大学 A kind of grid leakage loss localization method and device
CN108734679A (en) * 2018-05-23 2018-11-02 西安电子科技大学 A kind of computer vision system
CN109003223B (en) * 2018-07-13 2020-02-28 北京字节跳动网络技术有限公司 Picture processing method and device
CN109165738B (en) * 2018-09-19 2021-09-14 北京市商汤科技开发有限公司 Neural network model optimization method and device, electronic device and storage medium
CN110866550B (en) * 2019-11-01 2022-06-14 云南大学 Convolutional neural network, pyramid strip pooling method and malicious software classification method
CN110852263B (en) * 2019-11-11 2021-08-03 北京智能工场科技有限公司 Mobile phone photographing garbage classification recognition method based on artificial intelligence
CN111598101B (en) * 2020-05-25 2021-03-23 中国测绘科学研究院 Urban area intelligent extraction method, system and equipment based on remote sensing image scene segmentation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512661A (en) * 2015-11-25 2016-04-20 中国人民解放军信息工程大学 Multi-mode-characteristic-fusion-based remote-sensing image classification method
CN105917354A (en) * 2014-10-09 2016-08-31 微软技术许可有限责任公司 Spatial pyramid pooling networks for image processing
CN105975931A (en) * 2016-05-04 2016-09-28 浙江大学 Convolutional neural network face recognition method based on multi-scale pooling
CN106157953A (en) * 2015-04-16 2016-11-23 科大讯飞股份有限公司 continuous speech recognition method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633282B2 (en) * 2015-07-30 2017-04-25 Xerox Corporation Cross-trained convolutional neural networks using multimodal images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105917354A (en) * 2014-10-09 2016-08-31 微软技术许可有限责任公司 Spatial pyramid pooling networks for image processing
CN106157953A (en) * 2015-04-16 2016-11-23 科大讯飞股份有限公司 continuous speech recognition method and system
CN105512661A (en) * 2015-11-25 2016-04-20 中国人民解放军信息工程大学 Multi-mode-characteristic-fusion-based remote-sensing image classification method
CN105975931A (en) * 2016-05-04 2016-09-28 浙江大学 Convolutional neural network face recognition method based on multi-scale pooling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning;Andrew Shin;《arXiv》;20160331;1-18页 *

Also Published As

Publication number Publication date
CN106991440A (en) 2017-07-28

Similar Documents

Publication Publication Date Title
CN106991440B (en) Image classification method of convolutional neural network based on spatial pyramid
CN110020682B (en) Attention mechanism relation comparison network model method based on small sample learning
CN112766199B (en) Hyperspectral image classification method based on self-adaptive multi-scale feature extraction model
CN110059878B (en) Photovoltaic power generation power prediction model based on CNN LSTM and construction method thereof
CN110414377B (en) Remote sensing image scene classification method based on scale attention network
CN110348399B (en) Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network
CN106778604B (en) Pedestrian re-identification method based on matching convolutional neural network
CN103927531B (en) It is a kind of based on local binary and the face identification method of particle group optimizing BP neural network
Lin et al. Hyperspectral image denoising via matrix factorization and deep prior regularization
CN108510012A (en) A kind of target rapid detection method based on Analysis On Multi-scale Features figure
CN112434655B (en) Gait recognition method based on adaptive confidence map convolution network
CN109255364A (en) A kind of scene recognition method generating confrontation network based on depth convolution
CN110097609B (en) Sample domain-based refined embroidery texture migration method
CN110097178A (en) It is a kind of paid attention to based on entropy neural network model compression and accelerated method
CN107766794A (en) The image, semantic dividing method that a kind of Fusion Features coefficient can learn
CN112347888B (en) Remote sensing image scene classification method based on bi-directional feature iterative fusion
CN111861906B (en) Pavement crack image virtual augmentation model establishment and image virtual augmentation method
CN107944483B (en) Multispectral image classification method based on dual-channel DCGAN and feature fusion
CN108021947A (en) A kind of layering extreme learning machine target identification method of view-based access control model
CN112115781A (en) Unsupervised pedestrian re-identification method based on anti-attack sample and multi-view clustering
CN109190666B (en) Flower image classification method based on improved deep neural network
CN105550712B (en) Aurora image classification method based on optimization convolution autocoding network
Yang et al. Down image recognition based on deep convolutional neural network
CN106529586A (en) Image classification method based on supplemented text characteristic
CN107423705A (en) SAR image target recognition method based on multilayer probability statistics model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20191224