CN111767860A - Method and terminal for realizing image recognition through convolutional neural network - Google Patents
Method and terminal for realizing image recognition through convolutional neural network Download PDFInfo
- Publication number
- CN111767860A CN111767860A CN202010613939.0A CN202010613939A CN111767860A CN 111767860 A CN111767860 A CN 111767860A CN 202010613939 A CN202010613939 A CN 202010613939A CN 111767860 A CN111767860 A CN 111767860A
- Authority
- CN
- China
- Prior art keywords
- neural network
- convolutional neural
- data set
- network model
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 117
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000011176 pooling Methods 0.000 claims abstract description 60
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000007781 pre-processing Methods 0.000 claims abstract description 19
- 230000006870 function Effects 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 210000002569 neuron Anatomy 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 238000013519 translation Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000003190 augmentative effect Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000003416 augmentation Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000009440 infrastructure construction Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000002028 premature Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/582—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a terminal for realizing image recognition through a convolutional neural network, which are used for acquiring a data set and preprocessing the data set; setting initial parameters of a convolutional neural network model, wherein the convolutional neural network model comprises a plurality of pooling layers, and cascading the output characteristics of the pooling layers; training the convolutional neural network model according to the data set, and performing image recognition by using the trained convolutional neural network model; the method and the terminal for realizing the image recognition through the convolutional neural network creatively cascade the output of each pooling layer, can fully utilize each scale characteristic of the image to classify, and greatly improve the accuracy and the speed of the classification of the convolutional neural network.
Description
Technical Field
The invention relates to the field of image processing, in particular to a method and a terminal for realizing image recognition through a convolutional network.
Background
With the development of global economy, the living standard of people is greatly improved, the automobile holding capacity is also sharply increased, more choices and convenience are provided for daily travel of people, but the contradiction between the relatively lagged traffic safety infrastructure construction and the relatively weak traffic safety management level is increasingly prominent, so that traffic accidents and traffic jam frequently occur, and the automobile parking system becomes a major social problem influencing the life of people. Therefore, the intelligent transportation system is receiving wide attention, and the active safe driving technology and the unmanned driving technology become research focuses of domestic and foreign scholars and enterprises. Road traffic sign recognition is an important component of active safety driving systems and automatic driving systems, and plays a great role in the road driving safety process. The automotive industry has near-critical requirements for safety and reliability due to the personal safety of passengers, and therefore, the traffic sign recognition system should have both high recognition accuracy and real-time recognition, so that the traffic sign recognition is still a challenging task.
In recent years, convolutional neural network methods such as LeNet, Alexnet, VGG, google net, Yolo, and ResNet have achieved unusual performance in the field of image detection and recognition, and a conventional LeNet-5 lightweight convolutional neural network model shown in fig. 1 includes five layers: for the identification of the traffic sign images, the excessively complex convolutional neural network can provide a more reliable identification result, but simultaneously causes unnecessary resource waste and has a lower calculation speed.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method and the terminal for realizing the image recognition through the convolutional neural network can quickly and correctly recognize the image.
In order to solve the technical problems, the invention adopts a technical scheme that:
a method for implementing image recognition by a convolutional neural network, comprising the steps of:
s1, acquiring a data set, and preprocessing the data set;
s2, setting initial parameters of a convolutional neural network model, wherein the convolutional neural network model comprises a plurality of pooling layers, and cascading the output characteristics of the pooling layers;
and S3, training the convolutional neural network model according to the data set, and performing image recognition by using the trained convolutional neural network model.
In order to solve the technical problem, the invention adopts another technical scheme as follows:
a terminal for implementing image recognition through a convolutional neural network, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
s1, acquiring a data set, and preprocessing the data set;
s2, setting initial parameters of a convolutional neural network model, wherein the convolutional neural network model comprises a plurality of pooling layers, and cascading the output characteristics of the pooling layers;
and S3, training the convolutional neural network model according to the data set, and performing image recognition by using the trained convolutional neural network model.
The invention has the beneficial effects that: the method comprises the steps of acquiring a data set, preprocessing the data set, cascading the output characteristics of each pooling layer, fully utilizing the local characteristics and the global characteristics of the image, analyzing the image according to the characteristic information of different scales, greatly improving the accuracy of the output result, ensuring that the input image meets the processing conditions of a convolutional neural network by preprocessing, further improving the accuracy of the output result, preprocessing the image, unifying the image and improving the analysis speed of the convolutional neural network on the image.
Drawings
FIG. 1 is a diagram of a conventional convolutional neural network model LeNet-5;
FIG. 2 is a flowchart illustrating steps of a method for performing image recognition via a convolutional neural network, according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a terminal for implementing image recognition through a convolutional neural network according to an embodiment of the present invention;
FIG. 4 is a sample distribution diagram of a data set according to an embodiment of the present invention;
fig. 5 is a schematic diagram of GTSRB dataset sample distribution according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of image augmentation according to an embodiment of the present invention;
FIG. 7 is a block diagram of a convolutional neural network model according to an embodiment of the present invention;
FIG. 8 is a graph of the loss function variation during the convolutional neural network model training process according to an embodiment of the present invention;
FIG. 9 is a graph of the accuracy function variation during the convolutional neural network model training process according to an embodiment of the present invention;
FIG. 10 is a selection of convolution kernel size parameter candidates and their results of operation in a convolutional neural network model, in accordance with an embodiment of the present invention;
FIG. 11 is a Dropout parameter selection and its results of operation in a convolutional neural network model, in accordance with an embodiment of the present invention;
FIG. 12 is a diagram of a candidate set of full-link layer neuron numbers and their results of operation in a convolutional neural network model, according to an embodiment of the present invention;
FIG. 13 is a comparison of model data for different convolutional neural networks according to embodiments of the present invention;
description of reference numerals:
1. a terminal for realizing image recognition through a convolutional neural network; 2. a processor;
3. a memory.
Detailed Description
In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.
Referring to fig. 2, a method for implementing image recognition by an over-convolution neural network includes the steps of:
s1, acquiring a data set, and preprocessing the data set;
s2, setting initial parameters of a convolutional neural network model, wherein the convolutional neural network model comprises a plurality of pooling layers, and cascading the output characteristics of the pooling layers;
and S3, training the convolutional neural network model according to the data set, and performing image recognition by using the trained convolutional neural network model.
From the above description, the beneficial effects of the present invention are: the method comprises the steps of acquiring a data set, preprocessing the data set, cascading the output characteristics of each pooling layer, fully utilizing the local characteristics and the global characteristics of the image, analyzing the image according to the characteristic information of different scales, greatly improving the accuracy of the output result, ensuring that the input image meets the processing conditions of a convolutional neural network by preprocessing, further improving the accuracy of the output result, preprocessing the image, unifying the image and improving the analysis speed of the convolutional neural network on the image.
Further, the convolutional neural network model further comprises a global average pooling layer;
in S2, the step of cascading the output characteristics of each pooling layer specifically includes:
unifying the size of the output features of each of the pooling layers;
connecting the output features of the pooling layers with uniform size in series to form a tensor;
inputting the tensor into the global average pooling layer.
As can be seen from the above description, the output of the pooling layer is connected to the input of the global average pooling layer after size conversion and feature fusion, so that the multi-scale features are highly purified in the global average pooling layer and are conveniently handed to a subsequent classifier, instead of a conventional flat layer (Flatten), so that the conversion from a feature map to classification recognition is more natural, and the classification of traffic signs is realized by using feature information of different scales, such as local features, global features and the like of an image.
Further, the data set is a road traffic sign image acquired by a vehicle-mounted camera in an actual traffic environment;
each road traffic sign image in the data set comprises traffic sign images shot under different light rays, different shielding degrees, different shooting angles and different vehicle motion speeds.
The preprocessing the data set in S1 includes:
s11, normalizing the size of the image in the data set to obtain a first data set;
s12, normalizing the pixels of the image in the first data set to obtain a second data set;
s13, amplifying the images in the second data set to obtain a third data set;
the S3 specifically includes: training the convolutional neural network model from the third data set.
According to the description, the practicability of the model after training is considered when the traffic sign image in the data set is selected, the image acquired by the vehicle-mounted camera in the actual traffic environment is used, so that the model after final training can adapt to the actual use requirement, different light rays, shielding degrees, shooting angles and vehicle movement speeds are set, the image shot under different scenes can be recognized by the final model, and the robustness of the final model is improved; the data set is preprocessed and then used for training the model, and accuracy of the final model recognition image is guaranteed.
Further, the augmentation is specifically:
s131, rotating in a random direction by taking the center of the image as an origin and 10 degrees as a unit;
s132, randomly translating the image by 4 pixel points, removing the spare part on one side, and stretching the image back to the size of the original image;
s133, randomly overturning the image;
the operations of S131, S132, and S133 are performed in an increasing order, but the order is not limited thereto.
According to the description, the data set is subjected to augmentation processing, the data set with sufficient data volume can be obtained, the model is trained, premature fitting of the model is prevented, and meanwhile the generalization capability of the model can be improved.
Further, the convolutional neural network further includes: a plurality of convolutional layers alternating with the pooling layers;
and the single convolution layer is formed by cascading a plurality of small convolution layers, and the input of each small convolution layer is connected with the output of the previous small convolution layer except for the head and the tail of the two small convolution layers.
From the above description, it can be known that the number of parameters can be effectively controlled while ensuring the scope of the receptive field by using a plurality of small convolution layers in cascade connection instead of the traditional single large convolution layer.
Further, a batch normalization algorithm is set after each convolution layer:
wherein B ═ { x ═ x1,x2,x3,...xmAnd, represents a batch of m data, μ, input into the convolutional neural network modelBRepresents the mean of the m data sets,the variance of the m data is a small positive number, γ is a conversion factor, and β is a translation factor.
From the above description, it can be seen that the batch normalization algorithm is set after each convolution layer, so that the convergence speed of the model can be increased, the divisor can be prevented from being 0 by adding a small positive number, and the expression capability of the network can be enhanced by adding a transformation factor and a translation factor.
Further, the convolutional neural network further includes: a fully-connected layer;
setting initial parameters of the convolutional neural network model in the step S2;
setting the number of convolution kernels, the weight of the convolution kernels, the bias and the number of neurons of a full connecting layer;
setting an initial learning rate, a target minimum error, a training period and the number of samples selected in a single training.
According to the description, the initial parameters of the convolutional neural network model can be set, the initial parameters can be set according to different conditions, the effect of the model is controlled, different requirements are met, and the flexibility of the model is high.
Further, the data set in S1 includes a training set and a test set;
the step S3 specifically includes:
training the convolutional neural network model according to the training set to obtain a first convolutional neural network model;
and verifying the first convolutional neural network model according to the test set to obtain the trained convolutional neural network model.
According to the above description, the data set is divided into the training set and the test set, the training set is used for training the convolutional neural network model, then the convolutional neural network model is verified according to the test set, if the convolutional neural network model does not meet the performance requirement, the convolutional neural network model is correspondingly adjusted according to the verification result, and the identification accuracy of the finally completed model is further ensured.
Further, training the convolutional neural network model according to the data set in step S3 specifically includes:
s31, selecting a cross entropy loss function as a target function, outputting a recognition result by a Softmax classifier, setting EPOCHS to be 50, BATCH _ SIZE to be 64 and initial learning rate to be 0.001, and gradually attenuating the learning rate along with the increase of iteration times by an Adam optimization algorithm;
s32, sending the data set into the convolutional neural network model, and calculating forward output;
s33, calculating the error of the forward output, and updating the weight and the bias in the convolutional neural network model by combining a back propagation algorithm;
s34, repeating the steps S32 and S33 until the objective function converges, and saving the convolutional neural network model at this time.
From the above description, the parameters of the convolutional neural network model are adjusted according to the objective function in combination with the back propagation algorithm, so that the reliability of the completed model is further ensured.
Referring to fig. 3, a terminal for implementing image recognition through a convolutional neural network includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the following steps when executing the computer program:
s1, acquiring a data set, and preprocessing the data set;
s2, setting initial parameters of a convolutional neural network model, wherein the convolutional neural network model comprises a plurality of pooling layers, and cascading the output characteristics of the pooling layers;
and S3, training the convolutional neural network model according to the data set, and performing image recognition by using the trained convolutional neural network model.
As can be seen from the above description, the beneficial effects of the present invention are: the method comprises the steps of acquiring a data set, preprocessing the data set, cascading the output characteristics of each pooling layer, fully utilizing the local characteristics and the global characteristics of the image, analyzing the image according to the characteristic information of different scales, greatly improving the accuracy of the output result, ensuring that the input image meets the processing conditions of a convolutional neural network by preprocessing, further improving the accuracy of the output result, preprocessing the image, unifying the image and improving the analysis speed of the convolutional neural network on the image.
Referring to fig. 2, fig. 4, fig. 6 and fig. 10-12, a first embodiment of the present invention is:
a method for realizing image recognition through a convolutional neural network specifically comprises the following steps:
s1, acquiring a data set, and preprocessing the data set;
the data set is a road traffic sign image acquired by a vehicle-mounted camera in an actual traffic environment;
each road traffic sign image in the data set comprises traffic sign images shot under different light rays, shielding degrees, shooting angles and vehicle movement speeds, each road traffic sign image only comprises one traffic sign, and the image resolution is different from 16 × 16 to 250 × 250;
the data set comprises a training set and a test set;
the pre-processing of the data set comprises:
s11, normalizing the size of the image in the data set to obtain a first data set: uniformly scaling the size of the sample image into 32 multiplied by 32 by a bilinear interpolation method;
s12, normalizing the pixels of the image in the first data set to obtain a second data set: compressing the pixel value range of each pixel point in the image from 0-255 to 0-1, and accelerating the convergence speed of the neural network;
s13, amplifying the images in the second data set to obtain a third data set:
referring to fig. 6, a first column from the left in fig. 6 is an image in the second data set, specifically:
s131, rotating in a random direction by taking the center of the image as an origin and 10 degrees as a unit; for example, 20 degrees, 80 degrees, 170 degrees, etc.;
s132, randomly translating the image by 4 pixel points, removing the spare part on one side, and stretching the image back to the size of the original image;
s133, randomly overturning the image;
the operations of S131, S132, and S133 are performed in an increasing order, but the order is not limited thereto;
in an alternative embodiment, the constructed data set comprises 3250 images of 40 types of traffic signs, wherein the distribution of the training set 2275 images, the test set 975 images and 3250 images in the 40 types of traffic signs is shown in fig. 4;
s2, setting initial parameters of the convolutional neural network model, including;
setting the size of a convolution kernel, the number of the convolution kernels, the weight of the convolution kernel, the bias and the number of neurons of a full connecting layer;
setting an initial learning rate, a target minimum error, a training period and a selected sample number (BATCH _ SIZE) of single training;
performing a parameter selection ratio experiment on the convolution kernel size, the Dropout parameter and the number of neurons in the full connection layer in the network structure according to the data set, respectively adopting 5 × 5, 0.5 and 256 in an optional implementation manner, specifically referring to fig. 10 to 12, respectively setting a preset parameter candidate set for each type of parameter, changing the value of the parameter in the preset parameter candidate set of one type of parameter at a time by a control variable method, operating the convolution neural network model, and selecting the parameter with the highest benefit in the parameter candidate set by an operation result;
in an alternative embodiment, an EPOCH (when a complete data set has passed through the neural network once and back, this process is referred to as an EPOCH) is set to 50, BATCH _ SIZE is set to 64, the initial learning rate is 0.001, and the learning rate is gradually attenuated as the number of iterations increases by the Adam optimization algorithm;
s3, training the convolutional neural network model according to the data set, and performing image recognition by using the trained convolutional neural network model;
specifically, the data set is a third data set subjected to size normalization, pixel normalization and amplification;
the method specifically comprises the following steps: training the convolutional neural network model according to the training set to obtain a first convolutional neural network model; and verifying the first convolutional neural network model according to the test set to obtain the convolutional neural network model meeting the requirement.
Referring to fig. 7, a second embodiment of the present invention is:
a method for realizing image recognition through a convolutional neural network, which is different from the first embodiment in that:
the convolutional neural network model comprises an input layer (input), a plurality of pooling layers (Maxpool), a plurality of convolutional layers (Conv), a global average pooling layer (Global average Pooling), a fully connected layer (Full) and a classifier;
the convolutional layers alternate with the pooling layers;
the convolutional layers all adopt a ReLU activation function: ReLU max (0, x), where the gradient of the function remains 1 at times greater than 0 and 0 at other times;
adding a Dropout strategy behind each pooling layer, namely randomly closing or neglecting part of hidden layer neurons in the training process;
in S2, the step of cascading the output characteristics of each pooling layer specifically includes:
unifying the size of the output features of each of the pooling layers;
connecting the output features of the pooling layers with uniform size in series to form a tensor;
inputting the tensor into the global average pooling layer;
specifically, in this embodiment, the input of the global average pooling layer is each of the cascaded pooling layers:
GlobalAveragePooling merge_input=[Max_11,Max_22,Maxpool 3];
the output of each pooling layer is respectively connected with one size-uniform pooling layer, so that the sizes of the output characteristics of the pooling layers are uniform;
the Max _11 and Max _22 are two unified pooling layers, which are respectively connected with the Maxpool1 and Maxpool2, and the output features of the Maxpool1 and Maxpool2 are unified into 4 × 4 from 16 × 16 and 8 × 8;
[ Max _11, Max _22, Maxpool 3] indicates that the output signals of two unified size pooling layers and the pooling layer 3 are connected in series to form one tensor, and the output feature size of the pooling layer Maxpool3 is 4 x 4, so that the unified size pooling layers do not need to be connected;
the single convolution layer is formed by cascading a plurality of small convolution layers, and the input of each small convolution layer is connected with the output of the previous small convolution layer except for the head and the tail of the two small convolution layers;
wherein, the classifier can be a SoftMax classifier;
a Batch Normalization algorithm (BN) is set after each of the convolution layers:
wherein, B ═ { x ═ x1,x2,x3,...xmAnd, represents a batch of m data, μ, input into the convolutional neural network modelBRepresents the mean of the m data sets,the variance of the m data is to prevent a slight positive number added by a divisor of 0, γ is a transformation factor, and β is a translation factor;
in an optional embodiment, the convolutional neural network model comprises 3 convolutional layers, 3 pooling layers, 1 global average pooling layer, 1 fully-connected layer and 1 classification output layer, wherein the convolutional layers and the pooling layers are alternately connected; the 3 convolutional layers all adopt convolutional kernels with the size of 5 multiplied by 5, the step length is 1, and the number of the neurons is respectively 32, 64 and 128; the 3 layers of pooling layers all adopt convolution kernels with the size of 2 multiplied by 2, the step length is 2, and the maximum pooling is adopted;
the number of the parameters of the convolutional neural network model provided by the invention is shown in the table 1:
TABLE 1
Wherein, each convolutional layer is cascaded through a plurality of 5 × 5 convolutional layers to replace the traditional single 9 × 9 convolutional layer, and table 2 shows the number of parameters of the traditional large convolutional kernel convolutional neural network model:
TABLE 2
As can be seen from the comparison between tables 1 and 2, the increase of the number of parameters of the convolutional neural network model provided by the invention is small, but the reliability of the convolutional neural network model is obviously improved.
Referring to fig. 5, 8, 9 and 13, a third embodiment of the present invention is:
a method for implementing image recognition through a convolutional neural network, which is different from the first embodiment or the second embodiment in that training the convolutional neural network model according to the data set in S3 specifically includes:
s31, selecting a cross entropy loss function as a target function;
s32, sending the data set into the convolutional neural network model, and calculating forward output;
s33, calculating the error of the forward output, and updating the weight and the bias in the convolutional neural network model by combining a back propagation algorithm;
specifically, the weights of the bias and convolution kernels are updated;
s34, repeating the steps S32 and S33 until the objective function converges, and storing the convolutional neural network model at the moment;
taking a road traffic sign image which is acquired from a natural scene and only contains one traffic sign as the input of a stored convolutional neural network model, and realizing the specific category of the output traffic sign image;
referring to fig. 8, a loss value of a loss function in the training process decreases rapidly at the early stage of training, which indicates that a difference between a predicted result and a real result is large in the early stage of training, and as the training frequency increases, the change of the loss function tends to be smooth, and a model converges gradually, that is, the fitting degree of the model and the training data is higher and higher;
referring to fig. 9, it is a curve of the variation of the accuracy (precision) in the training process, and it can be seen from the figure that the accuracy is rapidly increased in the early stage of the training, and when the iteration is about thirty times or so, the variation region of the accuracy is gentle, the increase amplitude is very small, and the model is gradually converged;
in an alternative embodiment, a german traffic sign data set (GTSRB) is obtained containing 5 major and 43 minor traffic sign images for 51839 images, wherein 39209 training sets and 12630 testing sets; the distribution of 51839 images in the 43 subclasses is shown in FIG. 5;
the GTSRB is used to perform a comparison experiment on the convolutional neural network model (MS-TSRCNN) and two other convolutional neural network models (single-scale feature connected model, convolutional layer feature connected model), wherein the convolutional layer feature connected model has the same structure as the MS-TSRCNN except that the cascading of each pooling layer is not performed, and the experimental result is shown in fig. 13.
Referring to fig. 3, a fourth embodiment of the present invention is:
a terminal 1 for implementing image recognition through a convolutional neural network, comprising a processor 2, a memory 3 and a computer program stored on the memory 3 and operable on the processor 2, wherein the processor 2 implements the steps of the first embodiment, the second embodiment or the third embodiment when executing the computer program.
In summary, the invention provides a method and a terminal for realizing image recognition through a convolutional neural network, when a training data set is selected, road traffic sign images acquired by a vehicle-mounted camera in an actual traffic environment are adopted, and the traffic sign images shot under different light rays, shielding degrees, shooting angles and vehicle movement speeds are selected, various conditions possibly occurring under actual conditions are preset, targeted model training is performed in advance, and the recognition accuracy of a model in actual application is ensured; when a model is constructed, all pooling layers are cascaded, a global average pooling layer is arranged, characteristic information of local characteristics, global characteristics and the like in various scales can be fully utilized for carrying out traffic sign classification, the conversion from a characteristic diagram to classification recognition is more natural, a plurality of small pooling layers are used for replacing a large pooling layer, the scope of a receptive field is ensured, although the parameter quantity is increased in a small range, the reliability of a convolutional neural network model is obviously improved, the calculation speed of the model is improved, parameters are determined by utilizing a data set to carry out a parameter selection experiment first, the time of model training is accelerated, the recognition accuracy of the final model is improved, the final model is simple in structure, and the recognition of traffic signs can be completed quickly and correctly.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.
Claims (10)
1. A method for performing image recognition via a convolutional neural network, comprising the steps of:
s1, acquiring a data set, and preprocessing the data set;
s2, setting initial parameters of a convolutional neural network model, wherein the convolutional neural network model comprises a plurality of pooling layers, and cascading the output characteristics of the pooling layers;
and S3, training the convolutional neural network model according to the data set, and performing image recognition by using the trained convolutional neural network model.
2. The method of claim 1, wherein the convolutional neural network model further comprises a global mean pooling layer;
the step of cascading the output characteristics of each pooling layer specifically includes:
unifying the size of the output features of each of the pooling layers;
connecting the output features of the pooling layers with uniform size in series to form a tensor;
inputting the tensor into the global average pooling layer.
3. The method of claim 1, wherein the data set is a road traffic sign image collected by a vehicle-mounted camera in an actual traffic environment;
each road traffic sign image in the data set comprises traffic sign images shot under different light rays, different shielding degrees, different shooting angles and different vehicle motion speeds;
the preprocessing the data set in S1 includes:
s11, normalizing the size of the image in the data set to obtain a first data set;
s12, normalizing the pixels of the image in the first data set to obtain a second data set;
s13, amplifying the images in the second data set to obtain a third data set;
the S3 specifically includes: training the convolutional neural network model from the third data set.
4. The method of claim 3, wherein the augmenting is specifically:
s131, rotating in a random direction by taking the center of the image as an origin and 10 degrees as a unit;
s132, randomly translating the image by 4 pixel points, removing the spare part on one side, and stretching the image back to the size of the original image;
s133, randomly overturning the image;
the operations of S131, S132, and S133 are performed in an increasing order, but the order is not limited thereto.
5. The method of claim 1, wherein the convolutional neural network further comprises: a plurality of convolutional layers alternating with the pooling layers;
and the single convolution layer is formed by cascading a plurality of small convolution layers, and the input of each small convolution layer is connected with the output of the previous small convolution layer except for the head and the tail of the two small convolution layers.
6. The method of claim 1, wherein each convolutional layer is followed by a batch normalization algorithm:
wherein, B ═ { x ═ x1,x2,x3,...xmAnd, represents a batch of m data, μ, input into the convolutional neural network modelBRepresents the mean of the m data sets,the variance of the m data is to prevent a slight positive number added by a divisor of 0, γ is a transformation factor, and β is a translation factor.
7. The method of claim 1, wherein the convolutional neural network further comprises: a fully-connected layer;
setting initial parameters of the convolutional neural network model in the step S2;
setting the number of convolution kernels, the weight of the convolution kernels, the bias and the number of neurons of a full connecting layer;
setting an initial learning rate, a target minimum error, a training period and the number of samples selected in a single training.
8. The method of claim 1, wherein the data set in S1 includes a training set and a test set;
the step S3 specifically includes:
training the convolutional neural network model according to the training set to obtain a first convolutional neural network model;
and verifying the first convolutional neural network model according to the test set to obtain the trained convolutional neural network model.
9. The method according to claim 1, wherein the training of the convolutional neural network model according to the data set in step S3 specifically comprises:
s31, selecting a cross entropy loss function as a target function, outputting a recognition result by a Softmax classifier, setting EPOCHS to be 50, BATCH _ SIZE to be 64 and initial learning rate to be 0.001, and gradually attenuating the learning rate along with the increase of iteration times by an Adam optimization algorithm;
s32, sending the data set into the convolutional neural network model, and calculating forward output;
s33, calculating the error of the forward output, and updating the weight and the bias in the convolutional neural network model by combining a back propagation algorithm;
s34, repeating the steps S32 and S33 until the objective function converges, and saving the convolutional neural network model at this time.
10. A terminal for implementing image recognition by a convolutional neural network, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements a method for implementing image recognition by a convolutional neural network according to any one of claims 1 to 9 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010613939.0A CN111767860A (en) | 2020-06-30 | 2020-06-30 | Method and terminal for realizing image recognition through convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010613939.0A CN111767860A (en) | 2020-06-30 | 2020-06-30 | Method and terminal for realizing image recognition through convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111767860A true CN111767860A (en) | 2020-10-13 |
Family
ID=72723520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010613939.0A Pending CN111767860A (en) | 2020-06-30 | 2020-06-30 | Method and terminal for realizing image recognition through convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111767860A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113221620A (en) * | 2021-01-29 | 2021-08-06 | 太原理工大学 | Multi-scale convolutional neural network-based traffic sign rapid identification method |
CN113361654A (en) * | 2021-07-12 | 2021-09-07 | 广州天鹏计算机科技有限公司 | Image identification method and system based on machine learning |
CN113408188A (en) * | 2021-05-24 | 2021-09-17 | 浙江大学衢州研究院 | Method for identifying AFM image prediction material performance by convolutional neural network |
WO2022111231A1 (en) * | 2020-11-26 | 2022-06-02 | 中兴通讯股份有限公司 | Cnn training method, electronic device, and computer readable storage medium |
CN114821091A (en) * | 2022-06-29 | 2022-07-29 | 成都理工大学 | Nuclide rapid identification method based on logistic regression two-classification |
CN114841983A (en) * | 2022-05-17 | 2022-08-02 | 中国信息通信研究院 | Image countermeasure sample detection method and system based on decision score |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416270A (en) * | 2018-02-06 | 2018-08-17 | 南京信息工程大学 | A kind of traffic sign recognition method based on more attribute union features |
CN108710826A (en) * | 2018-04-13 | 2018-10-26 | 燕山大学 | A kind of traffic sign deep learning mode identification method |
CN109086799A (en) * | 2018-07-04 | 2018-12-25 | 江苏大学 | A kind of crop leaf disease recognition method based on improvement convolutional neural networks model AlexNet |
AU2018102037A4 (en) * | 2018-12-09 | 2019-01-17 | Ge, Jiahao Mr | A method of recognition of vehicle type based on deep learning |
CN109344883A (en) * | 2018-09-13 | 2019-02-15 | 西京学院 | Fruit tree diseases and pests recognition methods under a kind of complex background based on empty convolution |
CN109635784A (en) * | 2019-01-10 | 2019-04-16 | 重庆邮电大学 | Traffic sign recognition method based on improved convolutional neural networks |
CN110046544A (en) * | 2019-02-27 | 2019-07-23 | 天津大学 | Digital gesture identification method based on convolutional neural networks |
CN110580450A (en) * | 2019-08-12 | 2019-12-17 | 西安理工大学 | traffic sign identification method based on convolutional neural network |
CN111028207A (en) * | 2019-11-22 | 2020-04-17 | 东华大学 | Button flaw detection method based on brain-like immediate-universal feature extraction network |
-
2020
- 2020-06-30 CN CN202010613939.0A patent/CN111767860A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416270A (en) * | 2018-02-06 | 2018-08-17 | 南京信息工程大学 | A kind of traffic sign recognition method based on more attribute union features |
CN108710826A (en) * | 2018-04-13 | 2018-10-26 | 燕山大学 | A kind of traffic sign deep learning mode identification method |
CN109086799A (en) * | 2018-07-04 | 2018-12-25 | 江苏大学 | A kind of crop leaf disease recognition method based on improvement convolutional neural networks model AlexNet |
CN109344883A (en) * | 2018-09-13 | 2019-02-15 | 西京学院 | Fruit tree diseases and pests recognition methods under a kind of complex background based on empty convolution |
AU2018102037A4 (en) * | 2018-12-09 | 2019-01-17 | Ge, Jiahao Mr | A method of recognition of vehicle type based on deep learning |
CN109635784A (en) * | 2019-01-10 | 2019-04-16 | 重庆邮电大学 | Traffic sign recognition method based on improved convolutional neural networks |
CN110046544A (en) * | 2019-02-27 | 2019-07-23 | 天津大学 | Digital gesture identification method based on convolutional neural networks |
CN110580450A (en) * | 2019-08-12 | 2019-12-17 | 西安理工大学 | traffic sign identification method based on convolutional neural network |
CN111028207A (en) * | 2019-11-22 | 2020-04-17 | 东华大学 | Button flaw detection method based on brain-like immediate-universal feature extraction network |
Non-Patent Citations (5)
Title |
---|
BO-XUN WU等: "Traffic Sign Recognition with Light Convolutional Networks", 《2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW)》, pages 1 - 2 * |
仲会娟等: "基于多尺度卷积神经网络的交通标志识别方法", 《延边大学学报(自然科学版)》, vol. 46, no. 4, pages 359 - 365 * |
宋青松等: "基于多尺度卷积神经网络的交通标志识别", 《湖南大学学报(自然科学版)》, vol. 45, no. 8, pages 131 - 137 * |
曲佳博等: "基于ST-CNN的交通标志实时检测识别算法", 《计算机科学》, vol. 46, no. 11, pages 309 - 314 * |
田正鑫: "基于多尺度卷积神经网络的交通标志识别方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2018, pages 138 - 2982 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022111231A1 (en) * | 2020-11-26 | 2022-06-02 | 中兴通讯股份有限公司 | Cnn training method, electronic device, and computer readable storage medium |
CN113221620A (en) * | 2021-01-29 | 2021-08-06 | 太原理工大学 | Multi-scale convolutional neural network-based traffic sign rapid identification method |
CN113408188A (en) * | 2021-05-24 | 2021-09-17 | 浙江大学衢州研究院 | Method for identifying AFM image prediction material performance by convolutional neural network |
CN113361654A (en) * | 2021-07-12 | 2021-09-07 | 广州天鹏计算机科技有限公司 | Image identification method and system based on machine learning |
CN114841983A (en) * | 2022-05-17 | 2022-08-02 | 中国信息通信研究院 | Image countermeasure sample detection method and system based on decision score |
CN114841983B (en) * | 2022-05-17 | 2022-12-06 | 中国信息通信研究院 | Image countermeasure sample detection method and system based on decision score |
CN114821091A (en) * | 2022-06-29 | 2022-07-29 | 成都理工大学 | Nuclide rapid identification method based on logistic regression two-classification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111767860A (en) | Method and terminal for realizing image recognition through convolutional neural network | |
KR102224253B1 (en) | Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof | |
CN110348384B (en) | Small target vehicle attribute identification method based on feature fusion | |
CN110309856A (en) | Image classification method, the training method of neural network and device | |
CN112270347A (en) | Medical waste classification detection method based on improved SSD | |
CN110348447B (en) | Multi-model integrated target detection method with abundant spatial information | |
CN114255361A (en) | Neural network model training method, image processing method and device | |
CN112529146B (en) | Neural network model training method and device | |
CN111723829B (en) | Full-convolution target detection method based on attention mask fusion | |
CN111310604A (en) | Object detection method and device and storage medium | |
CN111898432A (en) | Pedestrian detection system and method based on improved YOLOv3 algorithm | |
CN113592060A (en) | Neural network optimization method and device | |
CN112883931A (en) | Real-time true and false motion judgment method based on long and short term memory network | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN112883930A (en) | Real-time true and false motion judgment method based on full-connection network | |
Huo et al. | Traffic sign recognition based on resnet-20 and deep mutual learning | |
Zhou et al. | Design of lightweight convolutional neural network based on dimensionality reduction module | |
Cao et al. | Head pose estimation algorithm based on deep learning | |
US20240078800A1 (en) | Meta-pre-training with augmentations to generalize neural network processing for domain adaptation | |
Li et al. | Citrus Disease and Pest Recognition Algorithm Based on Migration Learning | |
Kratz | Novel scenario detection in road traffic images | |
Dai et al. | Lightweight Network Ensemble Architecture for Environmental Perception on the Autonomous System. | |
Wang et al. | The Small Target Recognition Method of Neural Network Based on Spatial and Temporal Information | |
Kadam et al. | Convolutional neural network strategies for realtime object detection | |
Xia et al. | Multi-RPN Fusion-Based Sparse PCA-CNN Approach to Object Detection and Recognition for Robot-Aided Visual System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201013 |