CN109635842A - A kind of image classification method, device and computer readable storage medium - Google Patents
A kind of image classification method, device and computer readable storage medium Download PDFInfo
- Publication number
- CN109635842A CN109635842A CN201811350802.XA CN201811350802A CN109635842A CN 109635842 A CN109635842 A CN 109635842A CN 201811350802 A CN201811350802 A CN 201811350802A CN 109635842 A CN109635842 A CN 109635842A
- Authority
- CN
- China
- Prior art keywords
- residual
- convolution
- feature vector
- residual error
- section
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 239000013598 vector Substances 0.000 claims abstract description 94
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 26
- 230000009467 reduction Effects 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000000513 principal component analysis Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 8
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 5
- 230000008447 perception Effects 0.000 description 4
- 239000000203 mixture Substances 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
This programme is related to artificial intelligence, provides a kind of image classification method, device and computer readable storage medium, this method comprises: building depth residual error network, and pre-training is carried out on ImageNet, weight is obtained, and utilize weights initialisation depth residual error network;The output of the last one residual unit of multiple convolutional layers of depth residual error network is extracted respectively as feature vector;Dimension-reduction treatment is carried out to obtained feature vector;Classified using classifier to obtained feature vector.The present invention is based on the features that depth residual error network extracts to carry out image classification, the feature for the extraction that the feature extracted from the deeper of residual error network compares shallow-layer can capture more advanced another characteristic and improve classification performance, nicety of grading of the present invention is higher than CNN, also has reference to other field.
Description
Technical field
The present invention relates to artificial intelligence fields, specifically, are related to a kind of image classification method, device and computer-readable deposit
Storage media.
Background technique
The high speed development of artificial intelligence technology, deep neural network are increasingly used in computer vision, especially
It is image classification field.
In recent years, based on deep learning according to the different characteristic reflected in each comfortable image information, different classes of
The image processing method that target distinguishes.It carries out quantitative analysis to image using computer, every in image or image
A pixel or region are incorporated into as a certain kind in several classifications, with replace people vision interpretation using more and more extensive.So
And in current classification method, for large-sized image, calculation amount is very big, and nicety of grading is not high enough.
Summary of the invention
To solve the shortcomings of the prior art, the present invention provides a kind of image classification method, is applied to electronic device, should
Method includes: building depth residual error network, and pre-training is carried out on ImageNet, obtains weight, and utilize weights initialisation
Depth residual error network, the depth residual error network include multiple convolution sections, wherein and each convolution section includes multiple residual units,
Each residual unit successively includes three convolutional layers again;Extract respectively multiple convolution sections of depth residual error network the last one is residual
The output of poor unit is as feature vector;Dimension-reduction treatment is carried out to obtained feature vector;Using classifier to obtained feature
Vector is classified.
Preferably, depth residual error network is made of residual unit, and each residual unit indicates are as follows:
yi=h (xi)+F(xi, wi)
xi+1=f (yi)
Wherein,
F is residual error function;
F is ReLU function;
wiIt is weight matrix;
xiIt is i-th layer of input;
yiIt is i-th layer of output;
The formula of function h are as follows: h (xi)=xi
The formula of residual error function F are as follows:
F(xi, wi)=wi·σ(B(w′i)·σ(B(xi)))
Wherein, B (xi) it is that batch normalizes;
w′iIt is wiTransposition;
" " indicates convolution;
σ(xi)=max (x, 0).
Preferably, the depth residual error network includes sequentially connected first convolution section, the second convolution section, third convolution
Section, Volume Four product section, the 5th convolution section, input picture successively passes through the first to the 5th convolution section, in which: the first convolution section includes
The convolution of 7x7x64, wherein 7X7 indicates convolution kernel, and 64 indicate port number;Second convolution section includes 3 the second residual units, the
Two residual units successively include tri- convolutional layers of 1X1X64,3X3X64,1X1X256 again;Third convolution section includes 4 third residual errors
Unit, third residual unit successively include tri- convolutional layers of 1X1X128,3X3X128,1X1X512 again;Volume Four product section includes 6
A 4th residual unit, the 4th residual unit successively include tri- convolutional layers of 1X1X256,3X3X256,1X1X1024 again;5th
Convolution section includes 3 the 5th residual units, and the 5th residual unit successively includes 1X1X512,3X3X512,1X,1X2,048 tri- again
Convolutional layer.
Preferably, the defeated of the last one residual unit of third convolution section, Volume Four product section, the 5th convolution section is extracted respectively
It is used as feature vector out.
Preferably, the method for dimensionality reduction being carried out to the feature vector of extraction be using a convolutional layer, a maximum pond layer,
Two full articulamentums and softmax layers, the convolutional layer is made of 1 × 1 filter along 512 channels, and stride is set as
1, and the boundary of convolutional layer is filled using zero.
Preferably, the another method for carrying out dimensionality reduction to the feature vector of extraction is using principal component analysis by the 5th convolution section
The feature vector of output of the last one residual unit be reduced to n-dimensional vector, n is the channel for being extracted the convolutional layer of feature
Number.
Preferably, classified using linear SVM classifier to obtained feature vector.
The present invention also provides a kind of electronic device, the electronic device includes memory and the place that connect with the memory
Device is managed, is stored with the image classification program that can be run on the processor, described image sort program quilt in the memory
The processor realizes following steps when executing: building depth residual error network, and pre-training is carried out on ImageNet, obtains power
Weight, and weights initialisation depth residual error network is utilized, the depth residual error network includes multiple convolution sections, wherein each convolution
Section includes multiple residual units, and each residual unit successively includes three convolutional layers again;The more of depth residual error network are extracted respectively
The output of the last one residual unit of a convolution section is as feature vector;Dimension-reduction treatment is carried out to obtained feature vector;Make
Classified with classifier to obtained feature vector.
Preferably, depth residual error network is made of residual unit, and each residue unit indicates are as follows:
yi=h (xi)+F(xi, wi)
xi+1=f (yi)
Wherein,
F is residual error function;
F is ReLU function;
wiIt is weight matrix;
xiIt is i-th layer of input;
yiIt is i-th layer of output;
The formula of function h are as follows: h (xi)=xi
The formula of residual error function F are as follows:
F(xi, wi)=wi·σ(B(w′i)·σ(B(xi)))
Wherein, B (xi) it is that batch normalizes;
w′iIt is wiTransposition;
" " indicates convolution;
σ(xi)=max (x, 0).
The present invention also provides a kind of computer readable storage medium, including image point in the computer readable storage medium
Class method, when described image sort program is executed by processor, the step of realizing image classification method as described above.
Image classification method, device and computer readable storage medium proposed by the present invention are mentioned based on depth residual error network
Feature is taken to carry out image classification, the feature extraction extracted from the deeper of depth residual error network is more than shallower feature extraction performance
It is good.Experiments prove that nicety of grading is higher than CNN, also there is reference to other field.
Detailed description of the invention
By the way that embodiment is described in conjunction with following accompanying drawings, features described above of the invention and technological merit will become
More understands and be readily appreciated that.
Fig. 1 is the step flow chart for indicating the image classification method of the embodiment of the present invention;
Fig. 2 is the structural schematic diagram for indicating the residual unit of the embodiment of the present invention;
Fig. 3 is the structural schematic diagram for indicating the depth residual error network of the embodiment of the present invention;
Fig. 4-1 is the flow diagram for indicating the first dimension reduction method of the embodiment of the present invention;
Fig. 4-2 is the flow diagram for indicating second of dimension reduction method of the embodiment of the present invention;
Fig. 5 is the hardware structure schematic diagram for indicating the electronic device of the embodiment of the present invention;
Fig. 6 is the Program modual graph for indicating the image classification program of the embodiment of the present invention;
Fig. 7 is the composition schematic diagram for indicating the dimension-reduction treatment module of the embodiment of the present invention.
Specific embodiment
Image classification method, device and computer readable storage medium of the present invention described below with reference to the accompanying drawings
Embodiment.Those skilled in the art will recognize, without departing from the spirit and scope of the present invention, can
To be modified in a manner of a variety of different or combinations thereof to described embodiment.Therefore, attached drawing and description be inherently
It is illustrative, it is not intended to limit the scope of the claims.In addition, in the present specification, attached drawing is drawn not in scale,
And identical appended drawing reference indicates identical part.
It should be appreciated that ought be in the present specification and claims in use, term " includes " and "comprising" instruction be retouched
State the presence of feature, entirety, step, operation, element and/or component, but be not precluded one or more of the other feature, entirety,
Step, operation, the presence or addition of element, component and/or its set.It will also be understood that being wanted in description of the invention and right
Term "and/or" used in book is asked to refer to any combination and all possibility of one or more of associated item listed
Combination, and including these combinations.
The present invention provides a kind of image classification method, is applied to electronic device, this method comprises:
Step S10 constructs depth residual error network, and carries out pre-training on ImageNet, obtains weight, and utilize weight
Initialize depth residual error network.Wherein, ImageNet is computer vision system identification project name, is at present in the world
The maximum database of image recognition, actually one huge for image/visual exercise picture library.The depth is residual
Poor network includes multiple convolution sections, wherein each convolution section includes multiple residual units, and each residual unit successively includes three again
A convolutional layer.
Step S30 extracts the output of multiple residual units of depth residual error network as feature vector respectively.
In CNN (convolutional neural networks) model, shallower convolutional layer perception domain is smaller, and the spy of some regional areas is arrived in study
Sign;Deeper convolutional layer has biggish perception domain, can learn to being more abstracted some features.These abstract characteristics are to object
The sensibility such as size, the position and direction of body are higher, to facilitate the raising of recognition performance.Depth residual error network has deeper
The network of level, wherein typical residual unit is made of three convolutional layers.As shown in Figure 2.Feature extraction can be considered as depth
The output of filter library.The output is that form is w × h × d vector, and wherein w and h is the width and height of gained feature vector
Degree, d is the number of channel in convolutional layer.Therefore, feature extraction can be considered as the two-dimensional array for the local feature that there is d to tie up.The
One convolutional layer is the convolution in 1x1, and convolution kernel (namely output channel number) is 64, passes through the 1x1's of first convolutional layer
256 dimensions channel (channel) are dropped to 64 dimensions by convolution, and then by second convolutional layer, it is 64 that the convolution of 3X3, which keeps port number,
Dimension, finally by third convolutional layer, feature vector is reverted to 256 dimensions by the convolution of 1x1.
Step S50 carries out dimension-reduction treatment to obtained feature vector.Since the Output Size of convolutional layer is much larger than traditional
Based on the feature of 4096 Victoria C NN, for example, the size for the feature vector that the 5th convolution section is extracted is 7 × 7 × 2048.In order to reduce
Calculating cost relevant to the manipulation of feature vector carries out dimension-reduction treatment to obtained feature vector.
Step S70 classifies to obtained feature vector using classifier.
Further, depth residual error network is made of residual unit, and each residual unit indicates are as follows:
yi=h (xi)+F(xi, wi)
xi+1=f (yi)
Wherein,
F is residual error function;
F is ReLU function;
wiIt is weight matrix;
xiIt is i-th layer of input;
yiIt is i-th layer of output;
The formula of function h are as follows: h (xi)=xi
The formula of residual error function F are as follows:
F(xi, wi)=wi·σ(B(w′i)·σ(B(xi)))
Wherein, B (xi) it is that batch normalizes;
w′iIt is wiTransposition;
" " indicates convolution;
σ(xi)=max (x, 0).
Pre-training is carried out to depth residual error network by ImageNet, that is to say using depth residual error network in ImageNet
On picture carry out classification based training, obtain weight matrix wi, and utilize weight matrix w trained in advanceiInitialize depth residual error net
Network.
In one alternate embodiment, the depth residual error network includes sequentially connected first convolution section (conv1),
Two convolution sections (conv2), third convolution section (conv3), Volume Four accumulate section (conv4), the 5th convolution section (conv5), Yi Jiyi
A first full articulamentum FC1, input picture successively passes through the first to the 5th convolution section, and exports through the first full articulamentum FC1.
First convolution section includes the convolution of 7x7x64, wherein and 7X7 indicates convolution kernel, and 64 indicate port number,
Second convolution section include 3 the second residual units, the second residual unit again successively include 1X1X64,3X3X64,
Tri- convolutional layers of 1X1X256;
Third convolution section include 4 third residual units, third residual unit again successively include 1X1X128,3X3X128,
Tri- convolutional layers of 1X1X512;
Volume Four product section include 6 the 4th residual units, the 4th residual unit again successively include 1X1X256,3X3X256,
Tri- convolutional layers of 1X1X1024;
5th convolution section include 3 the 5th residual units, the 5th residual unit again successively include 1X1X512,3X3X512,
Tri- convolutional layers of 1X1X2048.
In one alternate embodiment, the study weight of deeper usually has more category feature, before convolutional layer it is defeated
The classification performance of outgoing vector is more excellent.If the convolutional layer of deep layer network forms very powerful feature using proper.Therefore, divide
Indescribably take the output of the last one residual unit of third convolution section, Volume Four product section, the 5th convolution section as feature vector.?
That is extracting the output of the last one convolutional layer of third convolution section, Volume Four product section, the 5th convolution section respectively as feature
Vector.
The treatment process that input picture in step S30 passes through the depth residual error network is specifically described below, with input
The size of image is illustrated for being 224x224x3.
Input first passes through first convolution section, and the size of input picture is 224x224x3, and the size for exporting image becomes
112x112, that is, the elongated diminution half of image, port number 64.
Then pass through the second convolution section, the second convolution section includes 3 the second residual units, and the second residual unit successively wraps again
Tri- convolutional layers of 1X1X64,3X3X64,1X1X256 are included, therefore, port number becomes 256, and the size for exporting image is 56x56.
Then pass through third convolution section, third convolution section includes 4 third residual units, and third residual unit successively wraps again
Tri- convolutional layers of 1X1X128,3X3X128,1X1X512 are included, it is 512 that output channel number, which increases, and the Output Size of image is
28x28。
Then by Volume Four product section, output channel number increases to 1024, image down 14x14.
Then pass through the 5th convolution section, output channel number increases to 2048, image down 7x7.
Using the first full articulamentum FC1 output.However, the present embodiment is not using final defeated of depth residual error network
Out as a result, but the output of the last one residual unit of extracting third convolution section, Volume Four product section, the 5th convolution section make respectively
For feature vector, corresponding feature vector is third feature vector 301, fourth feature vector 401, fifth feature vector 501.
Third feature vector 301, fourth feature vector 401, fifth feature vector 501 are subjected to dimension-reduction treatment respectively again.
In one alternate embodiment, in step S50, the method for carrying out dimensionality reduction to the feature vector of extraction is using successively
One dimensionality reduction convolutional layer (conv6) of connection, maximum pond layer, second, third full articulamentum FC2, FC3 and a softmax
The feature vector extracted from third convolution section, Volume Four product section, the 5th convolution section is carried out dimension-reduction treatment by layer respectively.For example, such as
Shown in Fig. 4-1, by the feature vector that the 5th convolution section is extracted sequentially input dimensionality reduction convolutional layer, maximum pond layer, second, third entirely
FC2, FC3 and softmax layers of articulamentum.The dimensionality reduction convolutional layer is made of 1 × 1 filter along 512 channels, and stride is set
It is set to 1, and for the convolutional layer, being filled with is zero.The boundary of convolutional layer is filled using zero, uses zero padding
The output data of convolutional layer can be allowed to keep and input data is in the constant of Spatial Dimension.
In one alternate embodiment, in step S50, as shown in the Fig. 4-2, the another of dimensionality reduction is carried out to the feature vector of extraction
One method is that third convolution section, Volume Four are accumulated to the last one residual error list of section, the 5th convolution section using principal component analysis (PCA)
The feature vector of the output of member is all reduced to n-dimensional vector, and the n is the port number for being extracted the convolutional layer of feature.For example, the 5th
The convolutional layer of the last one residual unit of convolution section is 1X1X2048, wherein port number 2048, then the 5th convolution section is most
The feature vector of the output of the latter residual unit is all reduced to 2048 dimensional vectors.
In one alternate embodiment, obtained feature vector is carried out using linear SVM (SVM) classifier
Classification.Fig. 4-2 shows the assembly line of the PCA-SVM module of the 5th convolution section.This method the experimental results showed that, feature extraction
Dimension can be significantly reduced in the case where not significantly reducing performance.
It is the hardware structure schematic diagram of electronic device 1 of the present invention shown in Fig. 5.The electronic device 1 is that one kind can be according to
The instruction for being previously set or storing, the automatic equipment for carrying out numerical value calculating and/or information processing.The electronic device 1 can be with
It is computer, is also possible to single network server, the server group of multiple network servers composition or based on cloud computing
The cloud being made of a large amount of hosts or network server, wherein cloud computing is one kind of distributed computing, by a group loose couplings
Computer set composition a super virtual computer.
In the present embodiment, electronic device 1 may include, but be not limited only to, and can be in communication with each other connection by system bus
Memory 11, processor 14 and display 15, it should be pointed out that Fig. 2 illustrates only the electronic device 1 with members,
It should be understood that be not required for implementing all components shown, the implementation that can be substituted is more or less component.
Wherein, memory 11 includes the readable storage medium storing program for executing of memory and at least one type.Inside save as the fortune of electronic device 1
Row provides caching;Readable storage medium storing program for executing can be for if flash memory, hard disk, multimedia card, card-type memory are (for example, SD or DX memory
Deng), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electric erasable can compile
Journey read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc. it is non-volatile
Storage medium.In some embodiments, readable storage medium storing program for executing can be the internal storage unit of electronic device 1, such as the electronics
The hard disk of device 1;In further embodiments, the external storage which is also possible to electronic device 1 is set
Plug-in type hard disk that is standby, such as being equipped on electronic device 1, intelligent memory card (Smart Media Card), secure digital
(Secure Digital) card, flash card (Flash Card) etc..In the present embodiment, the readable storage medium storing program for executing of memory 11 is usual
For storing the operating system and types of applications software that are installed on electronic device 1, such as the image classification program in the present embodiment
Code etc..In addition, memory 11 can be also used for temporarily storing the Various types of data that has exported or will export.
The processor 14 is for running the program code stored in the memory 11 or processing data.The display
Device 15 is used to show the image for needing to classify.
In addition, electronic device 1 further includes network interface, the network interface may include radio network interface or cable network
Interface, the network interface are commonly used in establishing communication connection between the electronic device 1 and other electronic equipments.
Image classification program is stored in memory 11, including at least one computer-readable finger stored in memory
It enables, which can be executed by processor 14, the method to realize each embodiment of the application;And
At least one computer-readable instruction is different according to the function that its each section is realized, can be divided into different logic modules.
In one embodiment, following steps are realized when above-mentioned image classification program program is executed by the processor 14:
Step S10 constructs depth residual error network, and carries out pre-training on ImageNet, obtains weight, and utilize weight
Initialize depth residual error network.Wherein, ImageNet is computer vision system identification project name, is at present in the world
The maximum database of image recognition, actually one huge for image/visual exercise picture library.The depth is residual
Poor network includes multiple convolution sections, wherein each convolution section includes multiple residual units, and each residual unit successively includes three again
A convolutional layer.
Step S30 extracts the output conduct of the last one residual unit of multiple convolutional layers of depth residual error network respectively
Feature vector.
Step S50 carries out dimension-reduction treatment to obtained feature vector.
Step S70 classifies to obtained feature vector using classifier.
Fig. 6 show the Program modual graph of image classification program 50.In the present embodiment, image classification program 50 is divided
For multiple modules, multiple module is stored in memory 11, and is executed by processor 14, to complete the present invention.The present invention
So-called module is the series of computation machine program instruction section for referring to complete specific function.
Described image sort program 50 can be divided into depth residual error network pre-training module 501, depth residual error network
Initialization module 502, characteristic vector pickup module 503, dimension-reduction treatment module 504, categorization module 505.
Depth residual error network pre-training module 501 obtains weight for carrying out pre-training on ImageNet.Wherein,
ImageNet is a computer vision system identification project name, is the current maximum database of image recognition in the world, real
It is exactly one huge for image/visual exercise picture library on border.Depth residual error netinit module 502 utilizes weight
Initialize depth residual error network.
Characteristic vector pickup module 503 extracts the output of multiple residual units of depth residual error network as feature respectively
Vector.
In CNN (convolutional neural networks) model, shallower convolutional layer perception domain is smaller, and the spy of some regional areas is arrived in study
Sign;Deeper convolutional layer has biggish perception domain, can learn to being more abstracted some features.These abstract characteristics are to object
The sensibility such as size, the position and direction of body are higher, to facilitate the raising of recognition performance.Residual error network has deeper time
Network, wherein typical residual unit is made of three convolutional layers.As shown in Figure 2.Feature extraction can be considered as depth-type filtration
The output in device library.The output is that form is w × h × d vector, and wherein w and h is the width and height of gained feature vector, and d is
The number of channel in convolutional layer.Therefore, feature extraction can be considered as the two-dimensional array for the local feature that there is d to tie up.First volume
Lamination is the convolution in 1x1, and convolution kernel (namely output channel number) is 64, passes through the convolution handle of the 1x1 of first convolutional layer
256 dimensions channel (channel) drop to 64 dimensions, and then by second convolutional layer, it is 64 dimensions that the convolution of 3X3, which keeps port number, most
Afterwards by third convolutional layer, feature vector is reverted to 256 dimensions by the convolution of 1x1.
Dimension-reduction treatment module 504 carries out dimension-reduction treatment to obtained feature vector.Since the Output Size of convolutional layer is long-range
In traditional feature based on 4096 Victoria C NN, for example, the size for the feature vector that the 5th convolution section is extracted is 7 × 7 × 2048.
In order to reduce the relevant calculating cost of manipulation to feature vector, dimension-reduction treatment is carried out to obtained feature vector.
Categorization module 505 classifies to obtained feature vector using classifier.
Further, depth residual error network is made of residual unit, and each residual unit indicates are as follows:
yi=h (xi)+F(xi, wi)
xi+1=f (yi)
Wherein,
F is residual error function;
F is ReLU function;
wiIt is weight matrix;
xiIt is i-th layer of input;
yiIt is i-th layer of output;
The formula of function h are as follows: h (xi)=xi
The formula of residual error function F are as follows:
F(xi, wi)=wi·σ(B(wi′)·σ(B(xi)))
Wherein, B (xi) it is that batch normalizes;
w′iIt is wiTransposition;
" " indicates convolution;
σ(xi)=max (x, 0).
Pre-training is carried out to depth residual error network by ImageNet, that is to say using depth residual error network in ImageNet
On picture carry out classification based training, obtain weight matrix wi, and utilize weight matrix w trained in advanceiInitialize depth residual error net
Network.
In one alternate embodiment, the depth residual error network includes sequentially connected first convolution section (conv1),
Two convolution sections (conv2), third convolution section (conv3), Volume Four accumulate section (conv4), the 5th convolution section (conv5), Yi Jiyi
A first full articulamentum FC1, input picture successively passes through the first to the 5th convolution section, and exports through the first full articulamentum FC1.
First convolution section includes the convolution of 7x7x64, wherein and 7X7 indicates convolution kernel, and 64 indicate port number,
Second convolution section include 3 the second residual units, the second residual unit again successively include 1X1X64,3X3X64,
Tri- convolutional layers of 1X1X256;
Third convolution section include 4 third residual units, third residual unit again successively include 1X1X128,3X3X128,
Tri- convolutional layers of 1X1X512;
Volume Four product section include 6 the 4th residual units, the 4th residual unit again successively include 1X1X256,3X3X256,
Tri- convolutional layers of 1X1X1024;
5th convolution section include 3 the 5th residual units, the 5th residual unit again successively include 1X1X512,3X3X512,
Tri- convolutional layers of 1X1X2048.
In one alternate embodiment, the study weight of deeper usually has more category feature, before convolutional layer it is defeated
The classification performance of outgoing vector is more excellent.If the convolutional layer of deep layer network forms very powerful feature using proper.Therefore, special
Sign vector extraction module 503 extracts the last one residual unit of third convolution section, Volume Four product section, the 5th convolution section respectively
Output is used as feature vector.That is, extracting the last one of third convolution section, Volume Four product section, the 5th convolution section respectively
The output of convolutional layer is as feature vector.
The treatment process that input picture in step S30 passes through the depth residual error network is specifically described below, with input
The size of image is illustrated for being 224x224x3.
Input first passes through first convolution section, and the size of input picture is 224x224x3, and the size for exporting image becomes
112x112, that is, the elongated diminution half of image, port number 64.
Then pass through the second convolution section, the second convolution section includes 3 the second residual units, and the second residual unit successively wraps again
Tri- convolutional layers of 1X1X64,3X3X64,1X1X256 are included, therefore, port number becomes 256, and the size for exporting image is 56x56.
Then pass through third convolution section, third convolution section includes 4 third residual units, and third residual unit successively wraps again
Tri- convolutional layers of 1X1X128,3X3X128,1X1X512 are included, it is 512 that output channel number, which increases, and the Output Size of image is
28x28。
Then by Volume Four product section, output channel number increases to 1024, image down 14x14.
Then pass through the 5th convolution section, output channel number increases to 2048, image down 7x7.
Using the first full articulamentum FC1 output.However, the present embodiment is not using final defeated of depth residual error network
Out as a result, but the output of the last one residual unit of extracting third convolution section, Volume Four product section, the 5th convolution section make respectively
For feature vector, corresponding feature vector is third feature vector 301, fourth feature vector 401, fifth feature vector 501.Again
Third feature vector 301, fourth feature vector 401, fifth feature vector 501 are subjected to dimension-reduction treatment respectively.
In one alternate embodiment, dimension-reduction treatment module 504 further includes the first dimension-reduction treatment unit 5041, the first dimensionality reduction
The method that the feature vector of 5041 pairs of processing unit extractions carries out dimensionality reduction is using a sequentially connected dimensionality reduction convolutional layer, one
Maximum pond layer, second, third full articulamentum FC2, FC3 and softmax layers, will be from third convolution section, Volume Four product section, the 5th
The feature vector that convolution section is extracted carries out dimension-reduction treatment respectively.For example, as shown in Fig. 4-1, by the feature of the 5th convolution section extraction
Vector sequentially inputs dimensionality reduction convolutional layer, maximum pond layer, two full articulamentums and softmax layers (soft maximum layer).The dimensionality reduction
Convolutional layer is made of 1 × 1 filter along 512 channels, and stride is set as 1, and for the convolutional layer, is filled with
It is zero.
In one alternate embodiment, dimension-reduction treatment module 504 further includes the second dimension-reduction treatment unit 5042, such as Fig. 4-2
Shown, the method that the feature vector that the second 5042 pairs of dimension-reduction treatment unit extracts carries out dimensionality reduction is using principal component analysis (PCA)
The feature vector of the output of the last one residual unit of third convolution section, Volume Four product section, the 5th convolution section is all reduced to n
Dimensional vector, the n are the port numbers for being extracted the convolutional layer of feature.For example, the last one residual unit of the 5th convolution section
Convolutional layer is 1X1X2048, wherein port number 2048, the then feature of the output of the last one residual unit of the 5th convolution section
Vector is all reduced to 2048 dimensional vectors.
In one alternate embodiment, categorization module 505 is using linear SVM (SVM) classifier to obtained spy
Sign vector is classified.Fig. 4-2 shows the assembly line of the PCA-SVM module of the 5th convolution section.The experimental result table of this method
Bright, the dimension of feature extraction can be significantly reduced in the case where not significantly reducing performance.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium
It can be hard disk, multimedia card, SD card, flash card, SMC, read-only memory (ROM), Erasable Programmable Read Only Memory EPROM
(EPROM), any one in portable compact disc read-only memory (CD-ROM), USB storage etc. or several timess
Meaning combination.It include image classification program 50 etc. in the computer readable storage medium, described image sort program 50 is processed
Following operation is realized when device 14 executes:
Step S10 constructs depth residual error network, and carries out pre-training on ImageNet, obtains weight, and utilize weight
Initialize depth residual error network;
Step S30 extracts the output conduct of the last one residual unit of multiple convolutional layers of depth residual error network respectively
Feature vector;
Step S50 carries out dimension-reduction treatment to obtained feature vector;
Step S70 classifies to obtained feature vector using classifier.
The specific embodiment of the computer readable storage medium of the present invention and above-mentioned image classification method and electronics fill
Set 1 specific embodiment it is roughly the same, details are not described herein.
The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art
For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification,
Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of image classification method is applied to electronic device, which is characterized in that this method comprises:
Depth residual error network is constructed, and carries out pre-training on ImageNet, obtains weight, and residual using weights initialisation depth
Poor network, the depth residual error network include multiple convolution sections, wherein each convolution section includes multiple residual units, each residual
Poor unit successively includes three convolutional layers again;
The output of the last one residual unit of multiple convolution sections of depth residual error network is extracted respectively as feature vector;
Dimension-reduction treatment is carried out to obtained feature vector;
Classified using classifier to obtained feature vector.
2. image classification method as described in claim 1, which is characterized in that depth residual error network is made of residual unit, often
A residual unit indicates are as follows:
yi=h (xi)+F(xi, wi)
xi+1=f (yi)
Wherein,
F is residual error function;
F is ReLU function;
wiIt is weight matrix;
xiIt is i-th layer of input;
yiIt is i-th layer of output;
The formula of function h are as follows: h (xi)=xi
The formula of residual error function F are as follows:
F(xi, wi)=wi·σ(B(w′i)·σ(B(xi)))
Wherein, B (xi) it is that batch normalizes;
w′iIt is wiTransposition;
" " indicates convolution;
σ(xi)=max (x, 0).
3. image classification method as described in claim 1, which is characterized in that
The depth residual error network includes sequentially connected first convolution section, the second convolution section, third convolution section, Volume Four product
Section, the 5th convolution section, input picture successively pass through the first to the 5th convolution section, in which:
First convolution section includes the convolution of 7x7x64, wherein 7X7 indicates convolution kernel, and 64 indicate port number;
Second convolution section includes 3 the second residual units, and the second residual unit successively includes 1X1X64,3X3X64,1X1X256 again
Three convolutional layers;
Third convolution section include 4 third residual units, third residual unit again successively include 1X1X128,3X3X128,
Tri- convolutional layers of 1X1X512;
Volume Four product section include 6 the 4th residual units, the 4th residual unit again successively include 1X1X256,3X3X256,
Tri- convolutional layers of 1X1X1024;
5th convolution section include 3 the 5th residual units, the 5th residual unit again successively include 1X1X512,3X3X512,
Tri- convolutional layers of 1X1X2048.
4. image classification method as claimed in claim 3, which is characterized in that
The output of the last one residual unit of third convolution section, Volume Four product section, the 5th convolution section is extracted respectively as feature
Vector.
5. image classification method as described in claim 1, which is characterized in that
It is using a convolutional layer, a maximum pond layer, two full connections to the method that the feature vector of extraction carries out dimensionality reduction
Layer and softmax layers, the convolutional layer is made of 1 × 1 filter along 512 channels, and stride is set as 1, and for
It is filled using zero on the boundary of convolutional layer.
6. image classification method as claimed in claim 3, which is characterized in that
To the another method that the feature vector of extraction carries out dimensionality reduction be using principal component analysis by the 5th convolution section the last one
The feature vector of the output of residual unit is reduced to n-dimensional vector, and n is the port number for being extracted the convolutional layer of feature.
7. image classification method as described in claim 1, which is characterized in that
Classified using linear SVM classifier to obtained feature vector.
8. a kind of electronic device, which is characterized in that the electronic device includes memory and the processing that connect with the memory
Device is stored with the image classification program that can be run on the processor in the memory, and described image sort program is by institute
It states when processor executes and realizes following steps:
Depth residual error network is constructed, and carries out pre-training on ImageNet, obtains weight, and residual using weights initialisation depth
Poor network, the depth residual error network include multiple convolution sections, wherein each convolution section includes multiple residual units, each residual
Poor unit successively includes three convolutional layers again;
The output of the last one residual unit of multiple convolution sections of depth residual error network is extracted respectively as feature vector;
Dimension-reduction treatment is carried out to obtained feature vector;
Classified using classifier to obtained feature vector.
9. electronic device as claimed in claim 8, which is characterized in that depth residual error network is made of residual unit, each surplus
Counit indicates are as follows:
yi=h (xi)+F(xi, wi)
xi+1=f (yi)
Wherein,
F is residual error function;
F is ReLU function;
wiIt is weight matrix;
xiIt is i-th layer of input;
yiIt is i-th layer of output;
The formula of function h are as follows: h (xi)=xi
The formula of residual error function F are as follows:
F(xi, wi)=wi·σ(B(w′i)·σ(B(xi)))
Wherein, B (xi) it is that batch normalizes;
w′iIt is wiTransposition;
" " indicates convolution;
σ(xi)=max (x, 0).
10. a kind of computer readable storage medium, which is characterized in that include image classification in the computer readable storage medium
Program when described image sort program is executed by processor, realizes the image classification as described in any one of claims 1 to 7
The step of method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811350802.XA CN109635842A (en) | 2018-11-14 | 2018-11-14 | A kind of image classification method, device and computer readable storage medium |
PCT/CN2019/089181 WO2020098257A1 (en) | 2018-11-14 | 2019-05-30 | Image classification method and device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811350802.XA CN109635842A (en) | 2018-11-14 | 2018-11-14 | A kind of image classification method, device and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109635842A true CN109635842A (en) | 2019-04-16 |
Family
ID=66067983
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811350802.XA Pending CN109635842A (en) | 2018-11-14 | 2018-11-14 | A kind of image classification method, device and computer readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109635842A (en) |
WO (1) | WO2020098257A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110651277A (en) * | 2019-08-08 | 2020-01-03 | 京东方科技集团股份有限公司 | Computer-implemented method, computer-implemented diagnostic method, image classification apparatus, and computer program product |
WO2020098257A1 (en) * | 2018-11-14 | 2020-05-22 | 平安科技(深圳)有限公司 | Image classification method and device and computer readable storage medium |
CN111192237A (en) * | 2019-12-16 | 2020-05-22 | 重庆大学 | Glue coating detection system and method based on deep learning |
CN112200302A (en) * | 2020-09-27 | 2021-01-08 | 四川翼飞视科技有限公司 | Construction method of weighted residual error neural network |
CN112465053A (en) * | 2020-12-07 | 2021-03-09 | 深圳市彬讯科技有限公司 | Furniture image-based object identification method, device, equipment and storage medium |
WO2021051497A1 (en) * | 2019-09-16 | 2021-03-25 | 平安科技(深圳)有限公司 | Pulmonary tuberculosis determination method and apparatus, computer device, and storage medium |
WO2021179117A1 (en) * | 2020-03-09 | 2021-09-16 | 华为技术有限公司 | Method and apparatus for searching number of neural network channels |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113159164B (en) * | 2021-04-19 | 2023-05-12 | 杭州科技职业技术学院 | Industrial Internet equipment collaborative operation method based on distribution type |
CN116385806B (en) * | 2023-05-29 | 2023-09-08 | 四川大学华西医院 | Method, system, equipment and storage medium for classifying strabismus type of eye image |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650781A (en) * | 2016-10-21 | 2017-05-10 | 广东工业大学 | Convolutional neural network image recognition method and device |
CN106709453A (en) * | 2016-12-24 | 2017-05-24 | 北京工业大学 | Sports video key posture extraction method based on deep learning |
CN107527044A (en) * | 2017-09-18 | 2017-12-29 | 北京邮电大学 | A kind of multiple car plate clarification methods and device based on search |
CN107590774A (en) * | 2017-09-18 | 2018-01-16 | 北京邮电大学 | A kind of car plate clarification method and device based on generation confrontation network |
CN108764134A (en) * | 2018-05-28 | 2018-11-06 | 江苏迪伦智能科技有限公司 | A kind of automatic positioning of polymorphic type instrument and recognition methods suitable for crusing robot |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107229952A (en) * | 2017-06-01 | 2017-10-03 | 雷柏英 | The recognition methods of image and device |
US9946960B1 (en) * | 2017-10-13 | 2018-04-17 | StradVision, Inc. | Method for acquiring bounding box corresponding to an object in an image by using convolutional neural network including tracking network and computing device using the same |
CN108596069A (en) * | 2018-04-18 | 2018-09-28 | 南京邮电大学 | Neonatal pain expression recognition method and system based on depth 3D residual error networks |
CN108596108B (en) * | 2018-04-26 | 2021-02-23 | 中国科学院电子学研究所 | Aerial remote sensing image change detection method based on triple semantic relation learning |
CN109635842A (en) * | 2018-11-14 | 2019-04-16 | 平安科技(深圳)有限公司 | A kind of image classification method, device and computer readable storage medium |
-
2018
- 2018-11-14 CN CN201811350802.XA patent/CN109635842A/en active Pending
-
2019
- 2019-05-30 WO PCT/CN2019/089181 patent/WO2020098257A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650781A (en) * | 2016-10-21 | 2017-05-10 | 广东工业大学 | Convolutional neural network image recognition method and device |
CN106709453A (en) * | 2016-12-24 | 2017-05-24 | 北京工业大学 | Sports video key posture extraction method based on deep learning |
CN107527044A (en) * | 2017-09-18 | 2017-12-29 | 北京邮电大学 | A kind of multiple car plate clarification methods and device based on search |
CN107590774A (en) * | 2017-09-18 | 2018-01-16 | 北京邮电大学 | A kind of car plate clarification method and device based on generation confrontation network |
CN108764134A (en) * | 2018-05-28 | 2018-11-06 | 江苏迪伦智能科技有限公司 | A kind of automatic positioning of polymorphic type instrument and recognition methods suitable for crusing robot |
Non-Patent Citations (1)
Title |
---|
鲍鲜杰 等: "基于生成对抗网络的SAR图像仿真方法研究", 《第五届高分辨率对地观测学术年会论文集》, 17 October 2018 (2018-10-17), pages 1 - 17 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020098257A1 (en) * | 2018-11-14 | 2020-05-22 | 平安科技(深圳)有限公司 | Image classification method and device and computer readable storage medium |
CN110651277A (en) * | 2019-08-08 | 2020-01-03 | 京东方科技集团股份有限公司 | Computer-implemented method, computer-implemented diagnostic method, image classification apparatus, and computer program product |
WO2021051497A1 (en) * | 2019-09-16 | 2021-03-25 | 平安科技(深圳)有限公司 | Pulmonary tuberculosis determination method and apparatus, computer device, and storage medium |
CN111192237A (en) * | 2019-12-16 | 2020-05-22 | 重庆大学 | Glue coating detection system and method based on deep learning |
CN111192237B (en) * | 2019-12-16 | 2023-05-02 | 重庆大学 | Deep learning-based glue spreading detection system and method |
WO2021179117A1 (en) * | 2020-03-09 | 2021-09-16 | 华为技术有限公司 | Method and apparatus for searching number of neural network channels |
CN112200302A (en) * | 2020-09-27 | 2021-01-08 | 四川翼飞视科技有限公司 | Construction method of weighted residual error neural network |
CN112465053A (en) * | 2020-12-07 | 2021-03-09 | 深圳市彬讯科技有限公司 | Furniture image-based object identification method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020098257A1 (en) | 2020-05-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635842A (en) | A kind of image classification method, device and computer readable storage medium | |
CN110188795B (en) | Image classification method, data processing method and device | |
WO2020238293A1 (en) | Image classification method, and neural network training method and apparatus | |
Chen et al. | DISC: Deep image saliency computing via progressive representation learning | |
JP2017062781A (en) | Similarity-based detection of prominent objects using deep cnn pooling layers as features | |
CN111275107A (en) | Multi-label scene image classification method and device based on transfer learning | |
Bianco et al. | Predicting image aesthetics with deep learning | |
CN108830322A (en) | A kind of image processing method and device, equipment, storage medium | |
CN110222718B (en) | Image processing method and device | |
CN114549913B (en) | Semantic segmentation method and device, computer equipment and storage medium | |
CN107679457A (en) | User identity method of calibration and device | |
CN109492093A (en) | File classification method and electronic device based on gauss hybrid models and EM algorithm | |
CN109766470A (en) | Image search method, device and processing equipment | |
CN114693624A (en) | Image detection method, device and equipment and readable storage medium | |
Kalliatakis et al. | Exploring object-centric and scene-centric CNN features and their complementarity for human rights violations recognition in images | |
CN111582372A (en) | Image classification method, model, storage medium and electronic device | |
CN114692750A (en) | Fine-grained image classification method and device, electronic equipment and storage medium | |
CN112364828B (en) | Face recognition method and financial system | |
CN113822134A (en) | Instance tracking method, device, equipment and storage medium based on video | |
Afzali et al. | Genetic programming for feature selection and feature combination in salient object detection | |
CN115457308B (en) | Fine granularity image recognition method and device and computer equipment | |
CN112749576B (en) | Image recognition method and device, computing equipment and computer storage medium | |
Cao et al. | MSANet: Multi-scale attention networks for image classification | |
CN115187456A (en) | Text recognition method, device, equipment and medium based on image enhancement processing | |
CN114387489A (en) | Power equipment identification method and device and terminal equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |