CN111079851B

CN111079851B - Vehicle type identification method based on reinforcement learning and bilinear convolution network

Info

Publication number: CN111079851B
Application number: CN201911371980.5A
Authority: CN
Inventors: 钟珊; 陈雪梅; 应文豪; 伏玉琛; 闫海英
Original assignee: Changshu Institute of Technology
Current assignee: Jiangsu Yiyou Huiyun Software Co ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-09-18
Anticipated expiration: 2039-12-27
Also published as: CN111079851A

Abstract

The invention discloses a vehicle identification method based on reinforcement learning and bilinear convolutional network, constructs a deep network model, sets hyperparameters of a fine-grained classification network and initializes the network, and establishes a Markov decision model for optimizing salient features; Perform scale transformation on the dataset; optimize the attention area: when the parameters of the fine-grained classification network are fixed, input the dataset into the fine-grained classification network, and use the reinforcement learning algorithm to optimize the saliency area and select the optimal attention area; A loss function is established to update the parameters of the fine-grained classification network; after the features are fused, the network is repeatedly trained until the attention region no longer changes; the model images to be tested are input into the trained model to obtain the corresponding detection results. The invention utilizes the reinforcement learning network to extract the salient features of the bottom layer, and uses the bilinear interpolation method to fuse the high-level semantic features and the salient features of the lower layers to improve the recognition accuracy.

Description

Vehicle Recognition Method Based on Reinforcement Learning and Bilinear Convolutional Network

技术领域technical field

本发明涉及一种车型识别方法，特别是涉及一种基于强化学习和双线性卷积网络的车型识别方法。The invention relates to a vehicle identification method, in particular to a vehicle identification method based on reinforcement learning and bilinear convolutional network.

背景技术Background technique

车型识别问题可以看作是细粒度分类问题一个应用分支，即对外观非常相似的同一个类别的不同子类进行分类。由于日常采集的车型图片容易受到姿势、视角和遮挡等因素影响，使得不同品牌的车型之间存在着较小的差异，而同一品牌的车型之间反而存在着较大的差异。如何有效地对车型识别是细粒度分类中的一个亟需解决的应用问题。The vehicle model recognition problem can be viewed as an applied branch of the fine-grained classification problem, that is, classifying different sub-classes of the same class with very similar appearance. Since the daily collected model pictures are easily affected by factors such as posture, viewing angle and occlusion, there are small differences between models of different brands, but there are large differences between models of the same brand. How to effectively identify vehicle models is an urgent application problem in fine-grained classification.

双线性卷积网络是近年来能以较高精度来实现细粒度分类的一种模型，具有结构简单和训练高效的优点，但其仅将最后一层的特征作为分类的输入特征，利用这类特征来进行训练时，会丢失较多的细节信息，而保留大部分的高层特征。由于细粒度分类的对象往往是外型相似，但在细节的表现上各不相同的物体，因此对于细节特征的刻画对于细粒度分类的识别率有着很大的影响。如果直接将双线性网络的底层特征和高层特征融合，由于底层特征的尺度较大，因此在和高层特征融合时，需要采用一些方法进行降维。当降维后得到的特征损失的主要信息为细节信息时，不仅无法提高分类的准确率，反而会延长网络的训练时间和最终的分类效率。Bilinear convolutional network is a model that can achieve fine-grained classification with high accuracy in recent years. It has the advantages of simple structure and efficient training, but it only uses the features of the last layer as the input features of classification. When training with class features, more detailed information will be lost, and most of the high-level features will be retained. Since the objects of fine-grained classification are often objects with similar appearance but different details in performance, the characterization of detailed features has a great impact on the recognition rate of fine-grained classification. If the low-level features of the bilinear network and the high-level features are directly fused, because the scale of the low-level features is relatively large, some methods need to be used for dimensionality reduction when merging with the high-level features. When the main information of the feature loss obtained after dimensionality reduction is the detailed information, it will not only fail to improve the accuracy of classification, but will prolong the training time of the network and the final classification efficiency.

强化学习作为一种序列决策问题的求解方法，通过将要求解的问题建模为MDP模型，再采用强化学习中的经典方法如时间差分算法、最小二乘时间差分算法和行动者评论家算法等来求解最优策略。因此，强化学习是一种非常适合用来提取底层特征中的显著性的方法。Reinforcement learning is a method for solving sequence decision problems. By modeling the problem to be solved as an MDP model, the classical methods in reinforcement learning such as time difference algorithm, least square time difference algorithm and actor-critic algorithm are used. to find the optimal strategy. Therefore, reinforcement learning is a very suitable method for extracting saliency in low-level features.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种基于强化学习和双线性卷积网络的车型识别方法，在较少的车型图片情况下提高车型识别准确率。The purpose of the present invention is to provide a vehicle identification method based on reinforcement learning and bilinear convolutional network, which can improve the accuracy of vehicle identification in the case of fewer vehicle pictures.

本发明的技术方案是这样的：一种基于强化学习和双线性卷积网络的车型识别方法，包括以下步骤：The technical scheme of the present invention is as follows: a vehicle identification method based on reinforcement learning and bilinear convolutional network, comprising the following steps:

(1)构建深度网络模型：构建用于进行车辆识别的基于强化学习和双线性卷积网络的细粒度分类网络；(1) Build a deep network model: build a fine-grained classification network based on reinforcement learning and bilinear convolutional network for vehicle recognition;

(2)设置细粒度分类网络的超参数：所述超参数包括网络的学习率、迭代次数和批量大小；(2) Setting the hyperparameters of the fine-grained classification network: the hyperparameters include the learning rate, the number of iterations and the batch size of the network;

(3)初始化网络：初始化细粒度分类网络的权值和阈值；(3) Initialize the network: initialize the weights and thresholds of the fine-grained classification network;

(4)建立优化显著性特征的马尔科夫决策模型；(4) Establish a Markov decision model to optimize saliency features;

(5)预处理数据集：对数据集进行尺度变换；(5) Preprocessing data set: scale transformation of the data set;

(6)优化注意力区域：在细粒度分类网络参数固定的情况下，将数据集输入细粒度分类网络，并采用强化学习算法优化显著性区域，选择最优的注意力区域；(6) Optimize the attention area: when the parameters of the fine-grained classification network are fixed, input the data set into the fine-grained classification network, and use the reinforcement learning algorithm to optimize the saliency area and select the optimal attention area;

(7)构造损失函数：建立对细粒度分类网络参数进行更新的损失函数，损失函数的定义为数据的真实标签与数据的预测标签的误差平方和；(7) Construct loss function: establish a loss function for updating fine-grained classification network parameters, the loss function is defined as the sum of squares of errors between the true label of the data and the predicted label of the data;

(8)融合特征：对数据集中的每个数据，利用步骤(6)优化的注意力区域和第五卷积层的特征，可以得到最终融合的结果，并用于进行分类；(8) Fusion features: For each data in the data set, the final fusion result can be obtained by using the attention region optimized in step (6) and the features of the fifth convolutional layer, and used for classification;

(9)训练网络：在固定最优注意力区域的情况下，利用数据集并采用梯度下降方法对细粒度分类网络再次训练，直到训练误差小于预设的阈值；(9) Training the network: In the case of fixing the optimal attention area, the fine-grained classification network is retrained by using the dataset and gradient descent method until the training error is less than the preset threshold;

(10)交替训练：重复执行(6)-(9)直到注意力区域不再变化为止；(10) Alternate training: Repeat (6)-(9) until the attention area no longer changes;

(11)采用需要测试的车型图像输入到训练完成的深度网络模型中，获得相应的检测结果。(11) Input the image of the vehicle model to be tested into the trained deep network model to obtain the corresponding detection result.

进一步地，所述步骤(1)中所述的双线性卷积网络的并行特征提取层采用VGG16的第一卷积层至第五卷积层，所述第一卷积层至第五卷积层输出的特征从细节特征向高级的语义特征注意力过渡，在所述第五卷积层后通过外积操作获得一个双线性向量，最后连接全连接层，并在输出上进行软最大化操作，实现对车型的识别与分类。Further, the parallel feature extraction layer of the bilinear convolutional network described in the step (1) adopts the first convolutional layer to the fifth convolutional layer of VGG16, and the first convolutional layer to the fifth convolutional layer. The features output by the convolution layer transition from detailed features to high-level semantic feature attention. After the fifth convolutional layer, a bilinear vector is obtained through the outer product operation, and finally the fully connected layer is connected, and soft max is performed on the output. To realize the identification and classification of vehicle models.

进一步地，所述步骤(4)建立优化显著性特征的马尔科夫决策模型的包括：Further, the step (4) of establishing the Markov decision model for optimizing the salient features includes:

401)状态空间X设为第三卷积层生成的特征中尺度大小为第五卷积层的所有子特征构成的集合，X＝{x₁,x₂,…,x_n}；401) The state space X is set as a set composed of all sub-features of the fifth convolutional layer in the feature mesoscale generated by the third convolutional layer, X ₌ _{ x1, _x2 ,...,xn};

402)动作空间U设为状态在状态空间的上下左右的移动构成的集合；402) The action space U is set as the set formed by the movement of the state in the up, down, left, and right of the state space;

403)状态迁移函数为f:X×U→X，对于状态空间中的任意状态x∈X，从动作空间中任意一个动作u∈U，下一个状态为动作u发生后的状态，该状态为第三卷积层生成的特征中的尺度大小为第五卷积层的某个子特征；403) The state transition function is f:X×U→X, for any state x∈X in the state space, from any action u∈U in the action space, the next state is the state after the action u occurs, and the state is The scale in the features generated by the third convolutional layer is a sub-feature of the fifth convolutional layer;

404)奖赏函数为：r:X×U→R，对于状态空间中的任意x∈X，从动作空间中任意一个动作u∈U，得到的立即奖赏。404) The reward function is: r:X×U→R, for any x∈X in the state space, the immediate reward obtained from any action u∈U in the action space.

优选地，所述动作空间U＝{0,1,2,3}，0表示状态向上的移动，1表示状态向左的移动，2表示状态向下的移动，3表示状态向右的移动。Preferably, the action space U={0, 1, 2, 3}, where 0 represents an upward movement of the state, 1 represents a leftward movement of the state, 2 represents a downward movement of the state, and 3 represents a rightward movement of the state.

进一步地，所述(6)优化注意力区域包括步骤：Further, the (6) optimization of the attention region includes the steps:

601)设置参数的值：折扣率γ，衰减因子λ，迭代的轮数e，每个迭代对应的最大时间步T，学习率α，探索率ε；601) Set the values of parameters: discount rate γ, decay factor λ, number of iterations e, maximum time step T corresponding to each iteration, learning rate α, exploration rate ε;

602)对于

初始化Q₀1(x,u)＝0，Q₀2(x,u)＝0；602) For

Initialize Q ₀ 1(x, u)=0, Q ₀ 2(x, u)=0;

603)判断情节数已达到最大值E：如果达到，转入步骤612)；否则转入步骤604)；603) It is judged that the number of episodes has reached the maximum value E: if it is reached, go to step 612); otherwise, go to step 604);

604)判断是否达到最大时间步：如果达到，转入步骤603)；否则转入步骤605)604) Determine whether the maximum time step is reached: if it is reached, go to step 603); otherwise, go to step 605)

605)初始化当前状态x＝x₀；605) Initialize the current state x=x ₀ ;

606)在(0,1)之间随机产生一个概率p，判断p<ε是否成立：如果成立，在当前状态选择的动作为：u＝argmax_u(Q₁(x,u)+Q₂(x,u))；否则在动作集中随机选择任意一个动作；606) Randomly generate a probability p between (0, 1), and judge whether p<ε is true: if it is true, the action selected in the current state is: u=argmax _u (Q ₁ (x,u)+Q ₂ ( x, u)); otherwise, randomly select any action in the action set;

607)执行当前选择的动作u，得到其对应的下一个状态x'；607) Execute the currently selected action u to obtain its corresponding next state x';

608)判断输出层得到的分类结果与真实标签是否一样：如果相同，立即奖赏r＝1；否则立即奖赏r＝0；608) Determine whether the classification result obtained by the output layer is the same as the real label: if they are the same, immediately reward r=1; otherwise, immediately reward r=0;

609)在(0,1)之间随机产生一个概率p，判断p<0.5是否成立：如果成立，更新Q值：Q₁(x,u)＝r+γmaxuQ₁(x′,u)；否则更新Q值：Q₂(x,u)＝r+γmaxuQ₂(x′,u)；609) Randomly generate a probability p between (0, 1), and judge whether p<0.5 holds: if it holds, update the Q value: Q ₁ (x,u)=r+γmaxuQ ₁ (x′,u); otherwise Update the Q value: Q ₂ (x,u)=r+γmaxuQ ₂ (x′,u);

610)更新当前时间步：t＝t+1，并转步骤604)进行判断；610) Update the current time step: t=t+1, and go to step 604) to judge;

611)更新当前情节：e＝e+1；611) Update the current plot: e=e+1;

612)输出当前的最优策略和值函数Q₁(x,u)、Q₂(x,u)。612) Output the current optimal policy and value functions Q ₁ (x, u), Q ₂ (x, u).

进一步地，所述步骤(7)中损失函数为：Further, the loss function in the step (7) is:

其中，y表示网络得到处的车型分类结果，y′表示车型图片的真实的标签。Among them, y represents the model classification result obtained by the network, and y′ represents the real label of the model image.

本发明所提供的技术方案的有益效果是，采用双线性卷积网络作为基本的深度网络构架，利用强化学习网络来提取底层的显著性特征，并通过双线性插值法来对高层语义特征和低层的显著性特征进行融合，最后，通过双线性卷积网络的全连接层和软最大化操作进行具体的车型识别，提高了车型识别准确率。结合强化学习网络，可在车型图片较少时很好地提取车型图片的显著性特征，适合进行在线车型识别，能被应用于视频监控领域的在线实时识别。The beneficial effect of the technical solution provided by the present invention is that the bilinear convolutional network is used as the basic deep network structure, the reinforcement learning network is used to extract the salient features of the bottom layer, and the bilinear interpolation method is used to analyze the high-level semantic features. It is fused with the salient features of the lower layers, and finally, the specific vehicle recognition is carried out through the fully connected layer of the bilinear convolutional network and the soft maximization operation, which improves the recognition accuracy of the vehicle. Combined with the reinforcement learning network, the salient features of the model pictures can be well extracted when there are few pictures of the model, which is suitable for online vehicle recognition and can be applied to online real-time recognition in the field of video surveillance.

附图说明Description of drawings

图1为本发明方法的流程图；Fig. 1 is the flow chart of the inventive method;

图2为本发明方法网络模型图；Fig. 2 is the network model diagram of the method of the present invention;

图3为本发明方法中双线性模型的单网络模型细化图。FIG. 3 is a detailed diagram of a single network model of the bilinear model in the method of the present invention.

具体实施方式Detailed ways

请结合图1所示，本实例涉及的基于强化学习和双线性卷积网络的车型识别方法，包含以下步骤：Please refer to Figure 1. The vehicle recognition method based on reinforcement learning and bilinear convolutional network involved in this example includes the following steps:

(1)构建深度网络模型：构建用于进行车辆识别的基于强化学习和双线性卷积网络的细粒度分类网络，其模型图如图2及图3所示。双线性卷积网络的并行特征提取层采用VGG16的第一卷积层至第五卷积层，第一卷积层至第五卷积层输出的特征从细节特征向高级的语义特征注意力过渡，在第五卷积层后通过外积操作获得一个双线性向量，最后连接全连接层，并在输出上进行软最大化操作，实现对车型的识别与分类。(1) Build a deep network model: Build a fine-grained classification network based on reinforcement learning and bilinear convolutional network for vehicle recognition. The model diagrams are shown in Figure 2 and Figure 3. The parallel feature extraction layer of the bilinear convolutional network adopts the first convolutional layer to the fifth convolutional layer of VGG16, and the features output from the first convolutional layer to the fifth convolutional layer are from detailed features to high-level semantic feature attention. In the transition, a bilinear vector is obtained through the outer product operation after the fifth convolutional layer, and finally the fully connected layer is connected, and a soft maximization operation is performed on the output to realize the identification and classification of vehicle models.

(2)设置网络的超参数：网络的学习率为0.02，迭代次数为10000次，批量大小为10张图片，训练的阈值为0.01；(2) Set the hyperparameters of the network: the learning rate of the network is 0.02, the number of iterations is 10,000, the batch size is 10 images, and the training threshold is 0.01;

(3)初始化网络：设置网络的所有权值和阈值为0.00001；(3) Initialize the network: set the ownership value and threshold of the network to 0.00001;

(4)构建MDP模型：构建优化显著性特征的马尔科夫决策模型，建立的MDP模型如下：(4) Build MDP model: build a Markov decision model that optimizes saliency features. The established MDP model is as follows:

401)状态空间建模：状态空间是在Conv3的输出特征图的基础上，所有能采用第五卷积层的尺度在第三卷积层输出特征上得到的特征构成了状态空间，其中，状态空间中包含4个包含边缘四个角落的特征图；401) State space modeling: The state space is based on the output feature map of Conv3, and all the features that can be obtained from the output features of the third convolution layer using the scale of the fifth convolution layer constitute the state space. The space contains 4 feature maps containing the four corners of the edge;

402)动作空间建模：动作空间建模向上、向作、向下和向右的移动，分别采用数字0、1、2和3对动作进行刻画；402) Action space modeling: the action space modeling moves upward, downward, and rightward, and uses numbers 0, 1, 2, and 3 to describe the action, respectively;

403)迁移函数建模：假设当前状态对应的特征所对应的位置为(x,y)，则：403) Migration function modeling: Assuming that the position corresponding to the feature corresponding to the current state is (x, y), then:

如果采取了向上的动作后，下一个状态的位置为(x,y-1)；If the upward action is taken, the position of the next state is (x, y-1);

如果采取了向左的动作，下一个状态的位置为(x-1,y)；If the action to the left is taken, the position of the next state is (x-1,y);

如果采取了向下的动作，下一个状态的位置为(x,y+1)If the downward action is taken, the position of the next state is (x,y+1)

如果采取了向右的动作，下一个状态的位置为(x+1,y)If the action to the right is taken, the position of the next state is (x+1,y)

404)奖赏函数建模：奖赏函数的建模依赖于深度网络当前的输出，即在将某一车型图输入深度网络时，采用目前的最优的注意力区域，得到的车型类别。当车型类别与真实类别相同时，立即奖赏为1；否则，奖赏为0。404) Reward function modeling: The modeling of the reward function depends on the current output of the deep network, that is, when a certain vehicle model is input into the deep network, the current optimal attention region is used to obtain the vehicle type. When the model category is the same as the real category, the immediate reward is 1; otherwise, the reward is 0.

(5)预处理数据集：下载数据集，并对数据集进行尺度变换，即平移和旋转等操作，对原始的数据集进行扩充，扩充的目的是增加网络的鲁棒性，即在对一些有噪声的图，网络具有很好的识别能力，同时防止训练时的过拟合现象。数据集Car-196下载的地址为：Car-196:https://ai.stanford.edu/～jkrause/cars/car_dataset.html。(5) Preprocessing data set: Download the data set, and perform scale transformation on the data set, that is, operations such as translation and rotation, and expand the original data set. The purpose of the expansion is to increase the robustness of the network, that is, in some With noisy graphs, the network has good recognition ability while preventing overfitting during training. The download address of the dataset Car-196 is: Car-196: https://ai.stanford.edu/~jkrause/cars/car_dataset.html.

为了使得网络具有更好的泛化能力，在训练阶段也采用鸟类数据集CUB-200和飞机类FGVC-Aircraft进行训练，其下载地址分别为：In order to make the network have better generalization ability, the bird dataset CUB-200 and the aircraft class FGVC-Aircraft are also used for training in the training phase. The download addresses are:

CUB-200:http://www.vision.caltech.edu/visipedia/CUB-200.html和FGVC-Aircraft:http://www.robots.ox.ac.uk/～vgg/data/fgvc-aircraft/。CUB-200: http://www.vision.caltech.edu/visipedia/CUB-200.html and FGVC-Aircraft: http://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft /.

(6)优化注意力区域：采用强化学习算法优化显著性区域，进行网络的训练，选择最优的注意力区域，优化的具体实施过程可以描述为：(6) Optimize the attention area: Use the reinforcement learning algorithm to optimize the saliency area, train the network, and select the optimal attention area. The specific implementation process of optimization can be described as:

601)设置参数的值：折扣率γ＝0.9，衰减因子λ＝0.95，迭代的轮数E＝200，每个迭代对应的最大时间步T＝1000，学习率α＝0.5，探索率ε＝0.1；601) Set the values of parameters: discount rate γ=0.9, decay factor λ=0.95, number of iterations E=200, maximum time step T=1000 corresponding to each iteration, learning rate α=0.5, exploration rate ε=0.1 ;

602)对于

初始化Q₀1(x,u)＝0，Q₀2(x,u)＝0，判断情节数已达到最大值E：602) For

Initialize Q ₀ 1(x, u)=0, Q ₀ 2(x, u)=0, and judge that the number of episodes has reached the maximum value E:

如果达到：If it reaches:

转入步骤Go to step

否则：otherwise:

转入步骤604)；Go to step 604);

603)判断是否达到最大时间步：603) Determine whether the maximum time step is reached:

如果达到：If it reaches:

转入步骤603)Go to step 603)

否则：otherwise:

转入步骤605)Go to step 605)

604)随机初始化当前状态x＝x₀；604) Randomly initialize the current state x=x ₀ ;

605)在(0,1)之间随机产生一个概率p，判断p<ε是否成立：605) Randomly generate a probability p between (0, 1), and judge whether p<ε holds:

如果成立：If true:

在当前状态选择的动作为：u＝argmax_u(Q₁(x,u)+Q₂(x,u))The action selected in the current state is: u=argmax _u (Q ₁ (x,u)+Q ₂ (x,u))

否则：otherwise:

在动作集中随机选择四个动作中的任意一个；Randomly select any one of the four actions in the action set;

606)执行当前选择的动作u，得到其对应的下一个状态x′606) Execute the currently selected action u to obtain its corresponding next state x'

607)判断输出层得到的分类结果与真实标签是否一样：607) Determine whether the classification result obtained by the output layer is the same as the real label:

如果相同：If the same:

立即奖赏r＝1Immediate reward r=1

否则：otherwise:

立即奖赏r＝0Immediate reward r=0

608)在(0,1)之间随机产生一个概率p，判断p<0.5是否成立：608) Randomly generate a probability p between (0, 1), and judge whether p<0.5 holds:

如果成立：If true:

更新Q值：Q₁(x,u)＝r+γmax_uQ₁(x′,u)Update Q value: Q ₁ (x,u)=r+γmax _u Q ₁ (x′,u)

否则：otherwise:

更新Q值：Q₂(x,u)＝r+γmax_uQ₂(x′,u)Update Q value: Q ₂ (x,u)=r+γmax _u Q ₂ (x′,u)

609)更新当前时间步：t＝t+1，并转步骤4)进行判断609) Update the current time step: t=t+1, and go to step 4) for judgment

610)更新当前情节：e＝e+1610) Update current episode: e=e+1

611)输出当前的最优策略和值函数Q₁(x,u)、Q₂(x,u)611) Output the current optimal policy and value functions Q ₁ (x, u), Q ₂ (x, u)

(7)构造损失函数：构造网络训练的损失函数为：(7) Constructing the loss function: The loss function for constructing the network training is:

(8)融合特征：在获得了最优值的特征区域后，固定该区域，并用于对高层特征(第5个卷积模块的输出)采用加和的方式进行融合，得到融合的高层特征，各层特征的输出以及融合特征的输出如图2所示；(8) Fusion features: After obtaining the feature region of the optimal value, fix the region and use it to fuse the high-level features (the output of the fifth convolution module) by summing to obtain the fused high-level features, The output of each layer feature and the output of the fusion feature are shown in Figure 2;

(9)训练网络：在固定最优注意力区域的情况下，利用数据集，并采用梯度下降方法对网络再次训练，直到训练误差小于预设的阈值；(9) Train the network: In the case of fixing the optimal attention area, use the data set and use the gradient descent method to retrain the network until the training error is less than the preset threshold;

(11)采用需要测试的车型图像输入到深度网络模型中，获得相应的检测结果。(11) Input the vehicle image to be tested into the deep network model to obtain the corresponding detection result.

采用本发明方法进行车型识别方法在各数据集的识别准确率如下表：The recognition accuracy rate of the vehicle type recognition method in each data set using the method of the present invention is as follows:

Claims

1. a vehicle type identification method based on reinforcement learning and bilinear convolutional network is characterized by comprising the following steps:

(1) constructing a depth network model: constructing a fine-grained classification network for vehicle identification based on reinforcement learning and a bilinear convolutional network;

(2) setting the hyper-parameters of the fine-grained classification network: the hyper-parameters comprise the learning rate, the iteration times and the batch size of the network;

(3) initializing the network: initializing a weight value and a threshold value of the fine-grained classification network;

(4) establishing a Markov decision model for optimizing the significance characteristics:

401) the state space X is set as a set formed by all sub-features of the fifth convolutional layer with the feature middle scale size generated by the third convolutional layer, and X is { X ═ X₁,x₂,…,x_n}；

402) The motion space U is a set of movements of the state in the state space up, down, left, and right;

403) the state transition function is f, multiplied by U → X, for any state X in the state space belonging to X, any action U belonging to U in the action space, the next state is the state after the action U occurs, and the state is a certain sub-feature of the fifth convolutional layer in the feature generated by the third convolutional layer;

404) the reward function is: r is X U → R, and for any X in the state space belonging to X, any action U in the action space belonging to U is rewarded immediately;

(5) preprocessing a data set: carrying out scale transformation on the data set;

(6) optimizing the attention area: under the condition that the parameters of the fine-grained classification network are fixed, the data set is input into the fine-grained classification network, a reinforcement learning algorithm is adopted to optimize the saliency region, and the optimal attention region is selected, wherein the method comprises the following steps:

601) setting the values of the parameters: discount rate gamma, attenuation factor lambda, iteration round number e, maximum time step T corresponding to each iteration, learning rate alpha and exploration rate;

602) for the

Initialization Q₀1(x,u)＝0，Q₀2(x,u)＝0；

603) Judging that the number of the episodes reaches the maximum value E: if so, go to step 612); otherwise go to step 604);

604) judging whether the maximum time step is reached: if yes, go to step 603); otherwise, go to step 605)

605) Initializing current state x ═ x₀；

606) Randomly generating a probability p between (0,1), judging p<Whether or not: if so, the actions selected in the current state are: u-argmax_u(Q₁(x,u)+Q₂(x, u)); otherwise, randomly selecting any one action in the action set;

607) executing the currently selected action u to obtain a next state x' corresponding to the action u;

608) judging whether the classification result obtained by the output layer is the same as the real label or not: if the two are the same, immediately rewarding r to be 1; otherwise, the prize r is 0 immediately;

609) randomly generating a probability p between (0,1), judging p<Whether or not 0.5 holds: if true, update Q: q₁(x,u)＝r+γmaxuQ₁(x', u); otherwise, updating the Q value: q₂(x,u)＝r+γmaxuQ₂(x′,u)；

610) Updating the current time step: t +1, and go to step 604) to make a determination;

611) and updating the current plot: e + 1;

612) outputting the current optimal strategy and value function Q₁(x,u)、Q₂(x,u)；

(7) Constructing a loss function: establishing a loss function for updating the fine-grained classification network parameters, wherein the loss function is defined as the sum of squares of errors of a real label of the data and a predicted label of the data;

(8) fusion characteristics: for each data in the data set, obtaining a final fusion result by using the attention area optimized in the step (6) and the characteristics of the fifth convolution layer, and using the final fusion result for classification;

(9) training a network: under the condition of fixing the optimal attention area, the data set is utilized and a gradient descent method is adopted to train the fine-grained classification network again until the training error is smaller than a preset threshold value;

(10) alternate training: repeating the steps (6) to (9) until the attention area is not changed any more;

(11) and inputting the vehicle type image to be tested into the trained deep network model to obtain a corresponding detection result.

2. The vehicle type identification method based on reinforcement learning and bilinear convolutional network of claim 1, wherein the parallel feature extraction layer of the bilinear convolutional network in step (1) adopts the first convolutional layer to the fifth convolutional layer of VGG16, the features output by the first convolutional layer to the fifth convolutional layer are transited from the detail features to the high-level semantic feature attention, a bilinear vector is obtained by an outer product operation after the fifth convolutional layer, and finally, a full connection layer is connected, and a soft maximization operation is performed on the output, so that the vehicle type identification and classification are realized.

3. The vehicle type identification method based on reinforcement learning and bilinear convolutional network of claim 1, wherein the motion space U ═ {0,1,2,3}, 0 represents upward movement of state, 1 represents leftward movement of state, 2 represents downward movement of state, and 3 represents rightward movement of state.

4. The vehicle type identification method based on reinforcement learning and bilinear convolutional network of claim 1, wherein the loss function in step (7) is:

wherein y represents the vehicle type classification result obtained by the network, and y' represents the real label of the vehicle type picture.