CN111986080A - Logistics vehicle feature positioning method based on improved master R-CNN - Google Patents
Logistics vehicle feature positioning method based on improved master R-CNN Download PDFInfo
- Publication number
- CN111986080A CN111986080A CN202010690178.9A CN202010690178A CN111986080A CN 111986080 A CN111986080 A CN 111986080A CN 202010690178 A CN202010690178 A CN 202010690178A CN 111986080 A CN111986080 A CN 111986080A
- Authority
- CN
- China
- Prior art keywords
- stage
- logistics
- logistics vehicle
- image
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000008569 process Effects 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000010586 diagram Methods 0.000 claims abstract description 11
- 230000001629 suppression Effects 0.000 claims abstract description 9
- 238000001514 detection method Methods 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000012216 screening Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 42
- 238000011176 pooling Methods 0.000 claims description 28
- 238000004364 calculation method Methods 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 9
- 238000013461 design Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 8
- 238000005286 illumination Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000014509 gene expression Effects 0.000 claims description 2
- 230000004807 localization Effects 0.000 claims 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 230000000717 retained effect Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000004438 eyesight Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4084—Scaling of whole images or parts thereof, e.g. expanding or contracting in the transform domain, e.g. fast Fourier transform [FFT] domain scaling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/60—Rotation of whole images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/62—Analysis of geometric attributes of area, perimeter, diameter or volume
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
A method for positioning logistics vehicle characteristics based on improved master R-CNN comprises the following steps: step one, enhancement processing of logistics vehicle images; introducing a data enhancement means to process the logistics vehicle image; step two, constructing a basic network model; a VGGNet-16 basic network is adopted as a feature extraction network; meanwhile, in order to realize the positioning of the logistics vehicles, a target detection positioning model of an RPN network is added behind a feature extraction module in a third convolution layering of a fifth convolution layer of VGGNet-16; thirdly, screening the logistics vehicle target by using a non-maximum suppression algorithm; step four, uniformly normalizing the target characteristics of the logistics vehicles; and (4) introducing the obtained characteristic diagram of the fixed dimension data into a seventh stage of the basic network model to obtain the accurate probability of the logistics vehicle positioning boundary frame and the corresponding vehicle type. The invention has good performance of positioning the characteristics of the logistics vehicles in different environments and scenes.
Description
Technical Field
The invention relates to a logistics vehicle feature positioning method based on improved master R-CNN.
Technical Field
In recent years, with the development of traffic logistics, more and more logistics vehicles serve the work and life of people, but the problem is caused, and the difficulty coefficient of vehicle parking management in a park is increased due to the excessive logistics engineering vehicles. Although the operation such as the operation of the commodity circulation vehicle of pulling, getting rid of and hanging can improve the operating efficiency that the goods loaded, but still there is the unreasonable parking stall that occupies of commodity circulation vehicle at present, pulls, gets rid of and hangs unable accurate charging scheduling problem, and what is more serious is that there are extremely dangerous behaviors such as fake plate in the detection of avoiding the control of some car owners.
In order to effectively solve the management problem in the aspect of logistics engineering vehicles, many examples of identifying logistics vehicles of different vehicle types by adopting technical means such as computer vision and the like exist at present, most of identification methods of the logistics vehicles are that images of the vehicle types are obtained from a traffic intersection camera or an image acquisition card, and because images acquired by traffic videos are that the vehicles pass through a certain position under a natural environment, namely the accurate position of the vehicles in the images is found out and then the characteristics of the vehicles are extracted, so that the vehicle types are identified. However, the current recognition method mainly has the following difficulties for vehicle type recognition: (1) the recognition effect of the vehicle type is greatly influenced under different illumination conditions, and the situation of recognition error is caused when the visual perception of the same vehicle is different in the environments such as sunny days, rainy days and snowy days; (2) the scene of the vehicle is complex and changeable, for example, in the scene with complex background such as a rural area, the foreground and the background can not be separated quickly and accurately; (3) the appearances of the vehicle types are various, and the appearances of different vehicle types comprise various parameters, such as colors, shapes, brands, sizes and the like, which all affect the recognition of the vehicle characteristics. In a word, the influence of uncertain factors such as environment, scene, appearance and the like still exists in the identification of the logistics vehicle characteristics by using computer vision at present, so that the logistics vehicle characteristics are difficult to identify.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a method for positioning logistics vehicle characteristics based on improved false R-CNN, aiming at the management problem in the aspect of logistics engineering vehicles and the problems that the traditional identification method is difficult to identify due to uncertain factors such as environment, scene, appearance and the like.
According to the method, the logistics vehicle image is subjected to data enhancement, so that the scene diversity of the sample image is increased; then, constructing a basic network model by using the improved faster-CNN; then, introducing a non-maximum suppression algorithm to screen a logistics vehicle target boundary box; and finally, uniformly normalizing the target characteristics of the logistics vehicles to realize accurate positioning.
In order to achieve the purpose, the invention adopts the following technical scheme:
a logistics vehicle feature positioning method based on improved master R-CNN comprises the following steps:
step one, enhancement processing of logistics vehicle images;
for the problems of fixed shooting angle, single background, low detection rate and the like of the logistics vehicles, the invention introduces a data enhancement means, and processes the logistics vehicle images through operations such as multi-scale equal-scale scaling, image rotation, saturation enhancement and the like, so as to increase the scene diversity of the logistics vehicle images, and further identify and position the logistics vehicle images.
1.1) carrying out multi-scale scaling operation on the logistics vehicles;
according to the principle that the high proportion of the specific length, the width and the height of the logistics vehicles in the original images is not damaged, the logistics vehicle images are scaled in multiple scales, and therefore the positioning network can learn the target features in the specific proportion.
Suppose that the pixel coordinate of a certain logistics vehicle before zooming is marked as A0(x0,y0) And the scaled coordinates are denoted as A1(x1,y1) Then A is0And A1Satisfy the relation:
(x1,y1)=(μx0,μy0) (1)
where μ denotes a scaling factor. The above equation corresponds to the image scaling matrix, expressed as the following matrix:
wherein, when mu is more than 1, the image magnification operation is represented; when μ < 1, an image reduction operation is indicated.
1.2) carrying out rotation operation on the logistics vehicle image;
when the camera shoots the logistics vehicles in fast running, the captured images have great angle difference, and in order to adapt to recognition and positioning of different angles, the logistics vehicle images obtained by capturing need to be subjected to rotation transformation, so that vehicle characteristic information of various angles is generated.
The center of the logistics vehicle image is set as a rotation center O (0,0), the anticlockwise rotation angle of the logistics vehicle image is marked as theta, and when any pixel point P (x, y) in the image is subjected to rotation transformation, the pixel point P is changed into P1(x1,y1) The rotation process is then represented by:
the above formula is a polar coordinate transformation formula, and corresponds to the image rotation matrix, and is expressed as the following matrix:
1.3) carrying out saturation enhancement operation on the logistics vehicle image;
in order to increase the diversity of data samples and enable the feature positioning network to be suitable for complex illumination environments, the method adjusts the saturation of the logistics vehicle images.
The specific flow of adjusting the image saturation is as follows:
(S1) calculating a pixel extremum on the logistics vehicle image;
where rgbMax denotes the pixel maximum value and rgbMin denotes the pixel minimum value.
(S2) saturation calculation;
the saturation S is calculated as follows:
delta=(rgbMax-rgbMin)/255 (6)
value=(rgbMax+rgbMin)/255 (7)
L=value/2 (8)
(S3) adjusting the logistics vehicle image saturation;
setting a saturation parameter β for adjusting the illumination intensity, the calculation process is as follows:
1. if the parameter beta is not less than 0, the intermediate variable is first calculatedThe value of (c):
and (3) adjusting the saturation:
RGB'=RGB+(RGB-L*255)*α (12)
2. if the parameter β <0, then:
RGB'=L*255+(RGB-L*255)*(1+α) (13)
the logistics vehicle image subjected to the scaling processing, the rotation operation and the saturation enhancement is applied to the following steps so as to accurately locate the logistics vehicle characteristics.
Step two, constructing a basic network model;
although the faster-CNN provides three basic networks for feature extraction, in order to obtain better feature extraction effect, the invention adopts the VGGNet-16 basic network as a feature extraction network for classifying logistics vehicles of different vehicle types. Meanwhile, in order to realize the positioning of the logistics vehicles, an RPN network target detection positioning model is added behind a feature extraction module in the third convolution layering of the fifth convolution layer of VGGNet-16.
The detailed design flow of the basic network model constructed by the invention is as follows:
(T1) first stage: firstly, inputting the W x H x 3 size image processed by the step one; then, carrying out convolution operation on the logistics vehicle image through two continuous convolution layers with 64 channels, wherein the convolution kernel size is 3 x 3, and the convolution step size is 2; the convolved image is then reduced in dimension through a 64-pass maximum pooling layer with a pooling kernel size of 2 x 2 and a step size of 2. This stage outputs a sheetSize feature map.
(T2) second stage: the flow is the same as the first stage, namely, the image obtained in the first stage is input into a second stage network, and then a new feature map is obtained through convolution and pooling operations. But unlike the first stage, the second stage convolution and pooling channels both become 128, with the other parameters being the same as in the first stage.
(T3) third stage: firstly, inputting the image output in the second stage into the network in the third stage; then, carrying out convolution operation on the image through three continuous convolution layers with 256 channels, wherein the convolution kernel size is 3 x 3, and the convolution step size is 2; the convolved image is then reduced in dimension through a 256-pass maximum pooling layer with a pooling kernel size of 2 x 2 and a step size of 2.
(T4) fourth stage: the process is the same as the third stage, namely, the image obtained in the third stage is input into a fourth stage network, and then a new characteristic diagram is obtained through convolution and pooling operations. But unlike the third stage, the convolution and pooling channels of the fourth stage both become 512, and the other parameters are the same as those of the third stage.
(T5) fifth stage: this stage consists of three convolutional layers, each with 512 channels, with a convolution kernel size of 3 x 3 and a convolution step size of 2. The signature size output at this stage is
(T6) sixth stage: firstly, connecting a convolution layer with the convolution kernel size of 3 x 3, the convolution step size of 2 and the convolution channel size of 512; then connecting a two-classification loss function and a frame regression loss function, and regressing and judging frame information and classification information (the probability of the frame information and the classification information is most likely to be a certain vehicle type) belonging to a logistics vehicle or a background;
(T7) seventh stage: firstly, connecting two 4096-channel full-connection layers; then connecting an overall loss function; and finally, outputting the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
In the above-mentioned infrastructure, the design of some parameters such as activation function, loss function, etc. is designed, and the following will be described in detail:
(P1) in the VGGNet-16 basic network, the activation functions for all post-convolution connections are all ReLu activation functions used in the present invention:
ReLu(x)=max(0,x) (14)
(P2) in the sixth stage, the classification loss function and bounding box regression loss function used are designed as follows:
classification damageLose Lrpn_clsExpressed as:
wherein p isiRepresenting the probability that the frame is a logistics vehicle or a background;and representing the probability that the real frame corresponding to the frame is a logistics vehicle or a background, if the real frame is a foreground, recording the probability as 1, and otherwise, recording the probability as 0.
Bounding box regression loss Lrpn_boxExpressed as:
wherein, tiFour-dimensional position information representing the predicted frame i, denoted as ti(xi,yi,wi,hi);Four-dimensional position information representing a real frame, denoted The function is represented as follows:
(P3) in the seventh stage, the overall loss function design rule is: the frame regression loss adopts a logistic regression mapping method; the classification loss adopts a gradient descent method; in the process of loss function reduction, Adam gradient reduction method is used for optimizing the loss function, and corresponding parameters are set to be alpha 0.001 and beta1=0.9、β20.999 and 10E-8.
(P4) during the training process, the learning rate adjustment strategy of the present invention employs a multi-stage decay approach.
Thirdly, screening the logistics vehicle target by using a non-maximum suppression algorithm;
and (4) the logistics vehicle images are processed by the basic network model in the second step to obtain more boundary frames on the same logistics vehicle, so that a method needs to be introduced to screen out redundant boundary frames. The specific operation flow is as follows:
(Q1) four-dimensional position information (x) based on the predicted bounding boxi,yi,wi,hi) The area S of all the predicted frames of each vehicle in the logistics vehicle image can be obtainedi;
Si=wi*hi (18)
(Q2) in the sixth stage of the basic network model, frame information and classification information belonging to the physical distribution vehicle or the background are determined through regression. For each real logistics vehicle, the corresponding boundary frames are more, the real logistics vehicles are sorted from large to small according to the probability, and the boundary frame with the maximum probability is screened out;
(Q3) circularly calculating the area intersection ratio I of the screened boundary frame and the rest boundary frames, if I is larger than a preset threshold (the default of the invention is that the threshold is 0.7), the boundary frame is considered to be heavily overlapped with the boundary frame screened in the step (Q2), and then the boundary frame is deleted until all the boundary frames in the step (Q2) are processed.
If and only if the condition is satisfiedThe calculation formula of the intersection-to-parallel ratio I is as follows:
if and only if the condition is satisfiedThe calculation formula of the intersection-to-parallel ratio I is as follows:
the subscripts in the above formulas (19) to (20) and their constraints may each be represented byAt the same time mustTaking the constraint in equation (19) as an example, it may be varied as follows:
if the constraint in equation (19) is changed to the form of equation (21), the subscript in equation (19) must also be changed accordingly.
When the constraint conditions in the equations (19) and (20) are both satisfied, the calculation formula of the intersection-to-parallel ratio I is:
otherwise, I ═ 0, meaning that the two bounding boxes do not intersect, then both remain.
In the expressions (19) to (22), max represents the maximum bounding box whose area is denoted by Smax(ii) a oti represents any other bounding box with area denoted Soti(ii) a The intersection area between the two bounding boxes is denoted Sovp;(xmax,ymax,wmax,hmax) Four-dimensional position information, namely a center coordinate, a frame width and a frame height, of the screened maximum bounding box is represented; (x)oti,yoti,woti,hoti) And four-dimensional position information representing any remaining bounding box.
Step four, uniformly normalizing the target characteristics of the logistics vehicles;
in order to solve the problem that the dimensions of subsequent connection layers are not matched due to different characteristics of the border of the non-maximum-value-restrained border, an interested area pooling layer is connected after the loss function of the sixth stage, and the borders with different characteristics of the border are subjected to unified normalization. The specific operation flow is as follows:
(M1) quantizing the four-dimensional position information of the bounding box on the logistics vehicle image obtained in the third step into integer array coordinates;
(M2) the quantized bounding box is equally divided into maximum pools of 4 x 4, 2 x 2, 1 x 1, forming fixed length data dimensions.
And (4) introducing the obtained characteristic diagram of the fixed dimension data into a seventh stage of the basic network model to obtain the accurate probability of the logistics vehicle positioning boundary frame and the corresponding vehicle type.
Preferably, in step 1.1), in order to reduce the cost of running the model, the present invention sets the scaling factor μ to 0.84, and stops scaling when the short-edge pixel is less than 100 pix.
Preferably, the area intersection ratio I of step (Q3) has a threshold value of 0.7.
The invention has the advantages that:
the invention provides a method for positioning logistics vehicle characteristics based on improved false R-CNN, aiming at the management problem of logistics engineering vehicles and the problem that the traditional identification method is difficult to identify due to uncertain factors such as environment, scene and appearance. Firstly, data enhancement is carried out on a logistics vehicle image, so that scene diversity of a sample image is increased; then, constructing a basic network model by using the improved master R-CNN; then, introducing a non-maximum suppression algorithm to screen a logistics vehicle target boundary box; and finally, uniformly normalizing the target characteristics of the logistics vehicles to realize accurate positioning. Therefore, the characteristic positioning performance of the logistics vehicle under the conditions of different environments, scenes and the like is superior to that of the traditional vehicle detection method, the problems in the aspect of logistics engineering vehicle management in a park can be well solved, and the method has certain practical value and application prospect.
Drawings
FIG. 1 is a schematic diagram of the image scaling of a logistics vehicle according to the present invention;
FIG. 2 is a schematic diagram of the image rotation of a logistics vehicle according to the present invention;
FIG. 3 is a diagram of the basic network architecture of the present invention;
FIG. 4 is a comparison graph before and after the logistic vehicle image is processed by the non-maximum algorithm designed by the invention; wherein, fig. 4a is the image before the non-maximum suppression, and fig. 4b is the image after the non-maximum suppression;
FIG. 5 is a flow chart of unified normalization of target features of the present invention;
FIG. 6 is a technical roadmap for the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
In order to overcome the defects in the prior art, the invention provides a method for positioning logistics vehicle characteristics based on improved false R-CNN, aiming at the management problem in the aspect of logistics engineering vehicles and the problems that the traditional identification method is difficult to identify due to uncertain factors such as environment, scene, appearance and the like. Firstly, data enhancement is carried out on a logistics vehicle image, so that scene diversity of a sample image is increased; then, constructing a basic network model by using the improved faster-CNN; then, introducing a non-maximum suppression algorithm to screen a logistics vehicle target boundary box; and finally, uniformly normalizing the target characteristics of the logistics vehicles to realize accurate positioning.
In order to achieve the purpose, the invention adopts the following technical scheme:
a logistics vehicle feature positioning method based on improved faster R-CNN comprises the following steps:
step one, enhancement processing of logistics vehicle images;
for the problems of fixed shooting angle, single background, low detection rate and the like of the logistics vehicles, the invention introduces a data enhancement means, and processes the logistics vehicle images through operations such as multi-scale equal-scale scaling, image rotation, saturation enhancement and the like, so as to increase the scene diversity of the logistics vehicle images, and further identify and position the logistics vehicle images.
1.1) carrying out multi-scale scaling operation on the logistics vehicles;
according to the principle that the high proportion of the specific length, the width and the height of the logistics vehicles in the original images is not damaged, the logistics vehicle images are scaled in multiple scales, and therefore the positioning network can learn the target features in the specific proportion.
Suppose that the pixel coordinate of a certain logistics vehicle before zooming is marked as A0(x0,y0) And the scaled coordinates are denoted as A1(x1,y1) Then A is0And A1Satisfy the relation:
(x1,y1)=(μx0,μy0) (1)
where μ denotes a scaling factor. The above equation corresponds to the image scaling matrix, expressed as the following matrix:
wherein, when mu is more than 1, the image magnification operation is represented; when μ < 1, an image reduction operation is indicated. To reduce the cost of model operation, the present invention sets the scaling factor μ to 0.84 and stops scaling when the short edge pixels are less than 100 pix.
1.2) carrying out rotation operation on the logistics vehicle image;
when the camera shoots the logistics vehicles in fast running, the captured images have great angle difference, and in order to adapt to recognition and positioning of different angles, the logistics vehicle images obtained by capturing need to be subjected to rotation transformation, so that vehicle characteristic information of various angles is generated.
The center of the logistics vehicle image is set as a rotation center O (0,0), the anticlockwise rotation angle of the logistics vehicle image is marked as theta, and when any pixel point P (x, y) in the image is subjected to rotation transformation, the pixel point P is changed into P1(x1,y1) The rotation process is then represented by:
the above formula is a polar coordinate transformation formula, and corresponds to the image rotation matrix, and is expressed as the following matrix:
1.3) carrying out saturation enhancement operation on the logistics vehicle image;
in order to increase the diversity of data samples and enable the feature positioning network to be suitable for complex illumination environments, the method adjusts the saturation of the logistics vehicle images.
The specific flow of adjusting the image saturation is as follows:
(S1) calculating a pixel extremum on the logistics vehicle image;
where rgbMax denotes the pixel maximum value and rgbMin denotes the pixel minimum value.
(S2) saturation calculation;
the saturation S is calculated as follows:
delta=(rgbMax-rgbMin)/255 (6)
value=(rgbMax+rgbMin)/255 (7)
L=value/2 (8)
(S3) adjusting the logistics vehicle image saturation;
setting a saturation parameter β for adjusting the illumination intensity, the calculation process is as follows:
1. if the parameter beta is not less than 0, the intermediate variable is first calculatedThe value of (c):
and (3) adjusting the saturation:
RGB'=RGB+(RGB-L*255)*α (12)
2. if the parameter β <0, then:
RGB'=L*255+(RGB-L*255)*(1+α) (13)
the logistics vehicle image subjected to the scaling processing, the rotation operation and the saturation enhancement is applied to the following steps so as to accurately locate the logistics vehicle characteristics.
Step two, constructing a basic network model;
although the faster-CNN provides three basic networks for feature extraction, in order to obtain better feature extraction effect, the invention adopts the VGGNet-16 basic network as a feature extraction network for classifying logistics vehicles of different vehicle types. Meanwhile, in order to realize the positioning of the logistics vehicles, an RPN network target detection positioning model is added behind a feature extraction module in the third convolution layering of the fifth convolution layer of VGGNet-16.
The detailed design flow of the basic network model constructed by the invention is as follows:
(T1) first stage: firstly, inputting the W x H x 3 size image processed by the step one; then, carrying out convolution operation on the logistics vehicle image through two continuous convolution layers with 64 channels, wherein the convolution kernel size is 3 x 3, and the convolution step size is 2; the convolved image is then reduced in dimension through a 64-pass maximum pooling layer with a pooling kernel size of 2 x 2 and a step size of 2. This stage outputs a sheetSize feature map.
(T2) second stage: the flow is the same as the first stage, namely, the image obtained in the first stage is input into a second stage network, and then a new feature map is obtained through convolution and pooling operations. But unlike the first stage, the second stage convolution and pooling channels both become 128, with the other parameters being the same as in the first stage.
(T3) third stage: firstly, inputting the image output in the second stage into the network in the third stage; then, carrying out convolution operation on the image through three continuous convolution layers with 256 channels, wherein the convolution kernel size is 3 x 3, and the convolution step size is 2; the convolved image is then reduced in dimension through a 256-pass maximum pooling layer with a pooling kernel size of 2 x 2 and a step size of 2.
(T4) fourth stage: the process is the same as the third stage, namely, the image obtained in the third stage is input into a fourth stage network, and then a new characteristic diagram is obtained through convolution and pooling operations. But unlike the third stage, the convolution and pooling channels of the fourth stage both become 512, and the other parameters are the same as those of the third stage.
(T5) fifth stage: this stage consists of three convolutional layers, each with 512 channels, with a convolution kernel size of 3 x 3 and a convolution step size of 2. The signature size output at this stage is
(T6) sixth stage: firstly, connecting a convolution layer with the convolution kernel size of 3 x 3, the convolution step size of 2 and the convolution channel size of 512; then connecting a two-classification loss function and a frame regression loss function, and regressing and judging frame information and classification information (the probability of the frame information and the classification information is most likely to be a certain vehicle type) belonging to a logistics vehicle or a background;
(T7) seventh stage: firstly, connecting two 4096-channel full-connection layers; then connecting an overall loss function; and finally, outputting the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
In the above-mentioned infrastructure, the design of some parameters such as activation function, loss function, etc. is designed, and the following will be described in detail:
(P1) in the VGGNet-16 basic network, the activation functions for all post-convolution connections are all ReLu activation functions used in the present invention:
ReLu(x)=max(0,x) (14)
(P2) in the sixth stage, the classification loss function and bounding box regression loss function used are designed as follows:
loss of classification Lrpn_clsExpressed as:
wherein p isiRepresenting the probability that the frame is a logistics vehicle or a background;and representing the probability that the real frame corresponding to the frame is a logistics vehicle or a background, if the real frame is a foreground, recording the probability as 1, and otherwise, recording the probability as 0.
Bounding box regression loss Lrpn_boxExpressed as:
wherein, tiFour-dimensional position information representing the predicted frame i, denoted as ti(xi,yi,wi,hi);Four-dimensional position information representing a real frame, denoted The function is represented as follows:
(P3) in the seventh stage, the overall loss function design rule is: the frame regression loss adopts a logistic regression mapping method; the classification loss adopts a gradient descent method; in the process of loss function reduction, Adam gradient reduction method is used for optimizing the loss function, and corresponding parameters are set to be alpha 0.001 and beta1=0.9、β20.999 and 10E-8.
(P4) during the training process, the learning rate adjustment strategy of the present invention employs a multi-stage decay approach.
Thirdly, screening the logistics vehicle target by using a non-maximum suppression algorithm;
and (4) the logistics vehicle images are processed by the basic network model in the second step to obtain more boundary frames on the same logistics vehicle, so that a method needs to be introduced to screen out redundant boundary frames. The specific operation flow is as follows:
(Q1) four-dimensional position information (x) based on the predicted bounding boxi,yi,wi,hi) The area S of all the predicted frames of each vehicle in the logistics vehicle image can be obtainedi;
Si=wi*hi (18)
(Q2) in the sixth stage of the basic network model, frame information and classification information belonging to the physical distribution vehicle or the background are determined through regression. For each real logistics vehicle, the corresponding boundary frames are more, the real logistics vehicles are sorted from large to small according to the probability, and the boundary frame with the maximum probability is screened out;
(Q3) circularly calculating the area intersection ratio I of the screened boundary frame and the rest boundary frames, if I is larger than a preset threshold (the default of the invention is that the threshold is 0.7), the boundary frame is considered to be heavily overlapped with the boundary frame screened in the step (Q2), and then the boundary frame is deleted until all the boundary frames in the step (Q2) are processed.
If and only if the condition is satisfiedThe calculation formula of the intersection-to-parallel ratio I is as follows:
if and only if the condition is satisfiedThe calculation formula of the intersection-to-parallel ratio I is as follows:
the subscripts in the above formulae (19) to (20) and their constraints may each be represented byAt the same time mustTaking the constraint in equation (19) as an example, it may be varied as follows:
if the constraint in equation (19) is changed to the form of equation (21), the subscript in equation (19) must be changed accordingly.
When the constraint conditions in the equations (19) and (20) are both satisfied, the calculation formula of the intersection-to-parallel ratio I is:
otherwise, I ═ 0, meaning that the two bounding boxes do not intersect, then both remain.
In formulae (19) to (22), max representsMaximum bounding box of table, area denoted Smax(ii) a oti represents any other bounding box with area denoted Soti(ii) a The intersection area between the two bounding boxes is denoted Sovp;(xmax,ymax,wmax,hmax) Four-dimensional position information, namely a center coordinate, a frame width and a frame height, of the screened maximum bounding box is represented; (x)oti,yoti,woti,hoti) And four-dimensional position information representing any remaining bounding box.
Step four, uniformly normalizing the target characteristics of the logistics vehicles;
in order to solve the problem that the dimensions of subsequent connection layers are not matched due to different characteristics of the border of the non-maximum-value-restrained border, an interested area pooling layer is connected after the loss function of the sixth stage, and the borders with different characteristics of the border are subjected to unified normalization. The specific operation flow is as follows:
(M1) quantizing the four-dimensional position information of the bounding box on the logistics vehicle image obtained in the third step into integer array coordinates;
(M2) the quantized bounding box is equally divided into maximum pools of 4 x 4, 2 x 2, 1 x 1, forming fixed length data dimensions.
And (4) introducing the obtained characteristic diagram of the fixed dimension data into a seventh stage of the basic network model to obtain the accurate probability of the logistics vehicle positioning boundary frame and the corresponding vehicle type.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.
Claims (3)
1. A logistics vehicle feature positioning method based on improved master R-CNN comprises the following steps:
step one, enhancement processing of logistics vehicle images;
introducing a data enhancement means, and processing the logistics vehicle images by operations of multi-scale equal-scale scaling, image rotation, saturation enhancement and the like so as to increase the scene diversity of the logistics vehicle images for further identification and positioning;
1.1) carrying out multi-scale scaling operation on the logistics vehicles;
scaling the logistics vehicle images in multiple scales without destroying the principle of the high ratio of the specific length, the width and the height of the logistics vehicles in the original images so that a positioning network can learn the target characteristics in a specific ratio;
suppose that the pixel coordinate of a certain logistics vehicle before zooming is marked as A0(x0,y0) And the scaled coordinates are denoted as A1(x1,y1) Then A is0And A1Satisfy the relation:
(x1,y1)=(μx0,μy0) (1)
where μ represents a scaling factor; the above equation corresponds to the image scaling matrix, expressed as the following matrix:
wherein, when mu is more than 1, the image magnification operation is represented; when mu < 1, representing an image reduction operation; to reduce the cost of model operation;
1.2) carrying out rotation operation on the logistics vehicle image;
when the camera shoots the logistics vehicles in rapid driving, the captured images have great angle difference, and in order to adapt to the identification and positioning of different angles, the captured logistics vehicle images need to be subjected to rotation transformation, so that vehicle characteristic information of various angles is generated;
setting the center of the logistics vehicle image as a rotation center O (0,0), recording the anticlockwise rotation angle of the logistics vehicle image as theta, and changing any pixel point P (x, y) in the image into P after rotation conversion1(x1,y1) The rotation process is then represented by:
the above formula is a polar coordinate transformation formula, and corresponds to the image rotation matrix, and is expressed as the following matrix:
1.3) carrying out saturation enhancement operation on the logistics vehicle image;
in order to increase the diversity of data samples and enable the feature positioning network to be suitable for complex illumination environments, the saturation of the logistics vehicle images is adjusted;
the specific flow of adjusting the image saturation is as follows:
(S1) calculating a pixel extremum on the logistics vehicle image;
wherein rgbMax denotes a pixel maximum value and rgbMin denotes a pixel minimum value;
(S2) saturation calculation;
the saturation S is calculated as follows:
delta=(rgbMax-rgbMin)/255 (6)
value=(rgbMax+rgbMin)/255 (7)
L=value/2 (8)
(S3) adjusting the logistics vehicle image saturation;
setting a saturation parameter β for adjusting the illumination intensity, the calculation process is as follows:
1. if the parameter beta is not less than 0, the intermediate variable is first calculatedThe value of (c):
and (3) adjusting the saturation:
RGB'=RGB+(RGB-L*255)*α (12)
2. if the parameter β <0, then:
RGB'=L*255+(RGB-L*255)*(1+α) (13)
the logistics vehicle image which is subjected to scaling processing, rotation operation and saturation enhancement is applied to the following steps so as to accurately position the characteristics of the logistics vehicle;
step two, constructing a basic network model;
the method comprises the steps that a VGGNet-16 basic network is adopted as a feature extraction network and is used for classifying logistics vehicles of different vehicle types; meanwhile, in order to realize the positioning of the logistics vehicles, a target detection positioning model of an RPN network is added behind a feature extraction module in a third convolution layering of a fifth convolution layer of VGGNet-16;
the steps of constructing the basic network model are as follows:
(T1) first stage: firstly, inputting the W x H x 3 size image processed by the step one; then, carrying out convolution operation on the logistics vehicle image through two continuous convolution layers with 64 channels, wherein the convolution kernel size is 3 x 3, and the convolution step size is 2; subsequently, the maximum pooling layer pair passes through one 64-passReducing the dimension of the convolved image, wherein the size of a pooling kernel is 2 x 2, and the step length is 2; this stage outputs a sheetA feature map of size;
(T2) second stage: the flow is the same as the first stage, namely, the image obtained in the first stage is input into a second stage network, and then a new characteristic diagram is obtained through convolution and pooling operations; but the convolution and pooling channels of the second stage are changed to 128 unlike the first stage, and other parameters are the same as those of the first stage;
(T3) third stage: firstly, inputting the image output in the second stage into the network in the third stage; then, carrying out convolution operation on the image through three continuous convolution layers with 256 channels, wherein the convolution kernel size is 3 x 3, and the convolution step size is 2; then, reducing the dimension of the convolved image through a maximum pooling layer of 256 channels, wherein the size of a pooling kernel is 2 x 2, and the step size is 2;
(T4) fourth stage: the process is the same as the third stage, namely, the image obtained in the third stage is input into a fourth stage network, and then a new characteristic diagram is obtained through convolution and pooling operations; but the difference from the third stage is that the convolution and pooling channels of the fourth stage are both changed to 512, and other parameters are the same as those of the third stage;
(T5) fifth stage: the stage consists of three convolution layers, each convolution layer has 512 channels, the convolution kernel size is 3 x 3, and the convolution step size is 2; the signature size output at this stage is
(T6) sixth stage: firstly, connecting a convolution layer with the convolution kernel size of 3 x 3, the convolution step size of 2 and the convolution channel size of 512; then connecting a two-classification loss function and a frame regression loss function, and regressing and judging frame information and classification information (the probability of the frame information and the classification information is most likely to be a certain vehicle type) belonging to a logistics vehicle or a background;
(T7) seventh stage: firstly, connecting two 4096-channel full-connection layers; then connecting an overall loss function; finally, outputting an accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type;
in the above infrastructure network structure, the design of parameters related to the activation function and the loss function specifically includes:
(P1) in the VGGNet-16 basic network, for all activation functions connected after the convolutional layer, the ReLu activation function is used:
ReLu(x)=max(0,x) (14)
(P2) in the sixth stage, using the classification loss function and bounding box regression loss function:
loss of classification Lrpn_clsExpressed as:
wherein p isiRepresenting the probability that the frame is a logistics vehicle or a background;representing the probability that the real frame corresponding to the frame is a logistics vehicle or a background, if the real frame is a foreground, recording the probability as 1, and otherwise, recording the probability as 0;
bounding box regression loss Lrpn_boxExpressed as:
wherein, tiFour-dimensional position information representing the predicted frame i, denoted as ti(xi,yi,wi,hi);Four-dimensional position information representing a real frame, denoted The function is represented as follows:
(P3) in the seventh stage, the overall loss function design rule is: the frame regression loss adopts a logistic regression mapping method; the classification loss adopts a gradient descent method; in the process of loss function reduction, Adam gradient reduction method is used for optimizing the loss function, and corresponding parameters are set to be alpha 0.001 and beta1=0.9、β20.999 and 10E-8;
(P4) during the training process, the learning rate adjustment strategy adopts a multi-stage attenuation method;
thirdly, screening the logistics vehicle target by using a non-maximum suppression algorithm;
the logistics vehicle image is processed by the basic network model in the second step to obtain more boundary frames on the same logistics vehicle, so that a method needs to be introduced to screen out the redundant boundary frames; the specific operation flow is as follows:
(Q1) four-dimensional position information (x) based on the predicted bounding boxi,yi,wi,hi) The area S of all the predicted frames of each vehicle in the logistics vehicle image can be obtainedi;
Si=wi*hi (18)
(Q2) in the sixth stage of the basic network model, determining the frame information and classification information belonging to the logistic vehicle or background through regression; for each real logistics vehicle, the corresponding boundary frames are more, the real logistics vehicles are sorted from large to small according to the probability, and the boundary frame with the maximum probability is screened out;
(Q3) circularly calculating the area intersection ratio I of the screened boundary frame and the rest boundary frames, if I is larger than a preset threshold value, determining that the boundary frame is heavily overlapped with the boundary frame screened in the step (Q2), and then deleting the boundary frame until all the boundary frames in the step (Q2) are processed;
if and only if the condition is satisfied
if and only if the condition is satisfied
the subscripts in the above formulae (19) to (20) and their constraints may each be represented byAt the same time mustTaking the constraint in equation (19) as an example, it may be varied as follows:
if the constraint in equation (19) is changed to the form of equation (21), the subscript in equation (19) must also be changed accordingly;
when the constraint conditions in the equations (19) and (20) are both satisfied, the calculation formula of the intersection-to-parallel ratio I is:
otherwise, I is 0, which means that the two bounding boxes do not intersect, both are retained;
in the expressions (19) to (22), max represents the maximum bounding box whose area is denoted by Smax(ii) a oti represents any other bounding box with area denoted Soti(ii) a The intersection area between the two bounding boxes is denoted Sovp;(xmax,ymax,wmax,hmax) Four-dimensional position information, namely a center coordinate, a frame width and a frame height, of the screened maximum bounding box is represented; (x)oti,yoti,woti,hoti) Four-dimensional position information representing any remaining bounding box;
step four, uniformly normalizing the target characteristics of the logistics vehicles;
in order to solve the problem that the dimensions of subsequent connection layers are not matched due to different characteristics of the border of the non-maximum-value restrained border, a region-of-interest pooling layer is connected after the loss function of the sixth stage, and the borders with different characteristics are subjected to unified normalization; the specific operation flow is as follows:
(M1) quantizing the four-dimensional position information of the bounding box on the logistics vehicle image obtained in the third step into integer array coordinates;
(M2) dividing the quantized bounding box equally into maximum pooling of 4 x 4, 2 x 2, 1 x 1, forming fixed length data dimensions;
and (4) introducing the obtained characteristic diagram of the fixed dimension data into a seventh stage of the basic network model to obtain the accurate probability of the logistics vehicle positioning boundary frame and the corresponding vehicle type.
2. The method for improving the feature localization of the master R-CNN-based logistics vehicles as claimed in claim 1, wherein: the threshold value of the area intersection ratio I in the step (Q3) is 0.7.
3. The method for improving the feature localization of the master R-CNN-based logistics vehicles as claimed in claim 1, wherein: step 1.1) sets the scaling factor μ to 0.84, and stops scaling when the short edge pixel is less than 100 pix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010690178.9A CN111986080B (en) | 2020-07-17 | 2020-07-17 | Logistics vehicle feature positioning method based on improved master R-CNN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010690178.9A CN111986080B (en) | 2020-07-17 | 2020-07-17 | Logistics vehicle feature positioning method based on improved master R-CNN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111986080A true CN111986080A (en) | 2020-11-24 |
CN111986080B CN111986080B (en) | 2024-01-16 |
Family
ID=73438739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010690178.9A Active CN111986080B (en) | 2020-07-17 | 2020-07-17 | Logistics vehicle feature positioning method based on improved master R-CNN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111986080B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832794A (en) * | 2017-11-09 | 2018-03-23 | 车智互联(北京)科技有限公司 | A kind of convolutional neural networks generation method, the recognition methods of car system and computing device |
CN109034024A (en) * | 2018-07-16 | 2018-12-18 | 浙江工业大学 | Logistics vehicles vehicle classification recognition methods based on image object detection |
CN110175524A (en) * | 2019-04-26 | 2019-08-27 | 南京航空航天大学 | A kind of quick vehicle checking method of accurately taking photo by plane based on lightweight depth convolutional network |
-
2020
- 2020-07-17 CN CN202010690178.9A patent/CN111986080B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832794A (en) * | 2017-11-09 | 2018-03-23 | 车智互联(北京)科技有限公司 | A kind of convolutional neural networks generation method, the recognition methods of car system and computing device |
CN109034024A (en) * | 2018-07-16 | 2018-12-18 | 浙江工业大学 | Logistics vehicles vehicle classification recognition methods based on image object detection |
CN110175524A (en) * | 2019-04-26 | 2019-08-27 | 南京航空航天大学 | A kind of quick vehicle checking method of accurately taking photo by plane based on lightweight depth convolutional network |
Also Published As
Publication number | Publication date |
---|---|
CN111986080B (en) | 2024-01-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106599773B (en) | Deep learning image identification method and system for intelligent driving and terminal equipment | |
CN108615226B (en) | Image defogging method based on generation type countermeasure network | |
CN109684922B (en) | Multi-model finished dish identification method based on convolutional neural network | |
CN113052210B (en) | Rapid low-light target detection method based on convolutional neural network | |
CN112101175A (en) | Expressway vehicle detection and multi-attribute feature extraction method based on local images | |
CN109345547B (en) | Traffic lane line detection method and device based on deep learning multitask network | |
CN111639564B (en) | Video pedestrian re-identification method based on multi-attention heterogeneous network | |
CN110796009A (en) | Method and system for detecting marine vessel based on multi-scale convolution neural network model | |
CN109886161B (en) | Road traffic identification recognition method based on likelihood clustering and convolutional neural network | |
CN109753878B (en) | Imaging identification method and system under severe weather | |
CN111914698B (en) | Human body segmentation method, segmentation system, electronic equipment and storage medium in image | |
CN110929593A (en) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing | |
CN111915525A (en) | Low-illumination image enhancement method based on improved depth separable generation countermeasure network | |
CN111368830A (en) | License plate detection and identification method based on multi-video frame information and nuclear phase light filtering algorithm | |
CN110706239A (en) | Scene segmentation method fusing full convolution neural network and improved ASPP module | |
CN112132145B (en) | Image classification method and system based on model extended convolutional neural network | |
CN111738056A (en) | Heavy truck blind area target detection method based on improved YOLO v3 | |
CN111768415A (en) | Image instance segmentation method without quantization pooling | |
CN112200746A (en) | Defogging method and device for traffic scene image in foggy day | |
CN108345835B (en) | Target identification method based on compound eye imitation perception | |
CN113159043A (en) | Feature point matching method and system based on semantic information | |
CN110889360A (en) | Crowd counting method and system based on switching convolutional network | |
CN114627269A (en) | Virtual reality security protection monitoring platform based on degree of depth learning target detection | |
CN116129291A (en) | Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device | |
CN115019340A (en) | Night pedestrian detection algorithm based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |