CN111986080B - Logistics vehicle feature positioning method based on improved master R-CNN - Google Patents
Logistics vehicle feature positioning method based on improved master R-CNN Download PDFInfo
- Publication number
- CN111986080B CN111986080B CN202010690178.9A CN202010690178A CN111986080B CN 111986080 B CN111986080 B CN 111986080B CN 202010690178 A CN202010690178 A CN 202010690178A CN 111986080 B CN111986080 B CN 111986080B
- Authority
- CN
- China
- Prior art keywords
- stage
- image
- logistics
- logistics vehicle
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 230000001629 suppression Effects 0.000 claims abstract description 12
- 238000010606 normalization Methods 0.000 claims abstract description 11
- 238000001514 detection method Methods 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims abstract description 7
- 230000004807 localization Effects 0.000 claims abstract description 4
- 238000012216 screening Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 42
- 238000011176 pooling Methods 0.000 claims description 30
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000013461 design Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 7
- 238000005286 illumination Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 230000003247 decreasing effect Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4084—Scaling of whole images or parts thereof, e.g. expanding or contracting in the transform domain, e.g. fast Fourier transform [FFT] domain scaling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/60—Rotation of whole images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/62—Analysis of geometric attributes of area, perimeter, diameter or volume
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
A method for improved faster R-CNN based logistic vehicle feature localization, comprising: step one, carrying out image enhancement processing on logistics vehicles; processing the logistics vehicle image by introducing a data enhancement means; step two, constructing a basic network model; adopting VGGNet-16 basic network as characteristic extraction network; meanwhile, in order to realize the positioning of the logistics vehicles, a target detection positioning model of an RPN network is added behind a feature extraction module in a third convolution layering of a fifth convolution layer of VGGNet-16; thirdly, screening a logistics vehicle target by using a non-maximum suppression algorithm; step four, unified normalization is carried out on the object characteristics of the logistics vehicles; and (3) transmitting the obtained feature map of the fixed dimension data to a seventh stage of the basic network model to obtain the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type. The invention has good characteristic positioning performance on logistics vehicles in different environments and scenes.
Description
Technical Field
The invention relates to a logistic vehicle characteristic positioning method based on improved faster R-CNN.
Technical Field
In recent years, with the development of traffic logistics, more and more logistics vehicles serve the work and the life of people, but the problem is caused by the fact that too many logistics engineering vehicles lead to the increase of the difficulty coefficient of vehicle parking management in a park. Although operations such as pulling and throwing of the logistics vehicle can improve the running efficiency of cargo loading, the problems that the logistics vehicle occupies a parking space unreasonably, the pulling and throwing cannot charge accurately and the like exist at present, and more serious is that some car owners have extremely dangerous behaviors such as fake license and the like in order to avoid monitoring detection.
In order to effectively solve the management problem of logistics engineering vehicles, a plurality of examples of identifying logistics vehicles of different vehicle types by adopting technical means such as computer vision and the like exist today, and the identification method is mostly to obtain images of the vehicle types from a traffic intersection camera or an image acquisition card, and because the images acquired by the traffic videos are the positions of the vehicles in a certain position in the natural environment, namely, the accurate positions of the vehicles in the images are found, and then the characteristic extraction operation of the vehicles is carried out on the vehicles, so that the identification of the vehicle types is achieved. However, the current recognition method mainly has the following difficulties in vehicle type recognition: (1) The recognition effect on the vehicle type is greatly influenced under different illumination conditions, and the situation of wrong recognition can be caused by different visual experiences of the same vehicle in sunny days, rainy days, snowy days and other environments; (2) The scene where the vehicle is located is complex and changeable, for example, in the scene with complex background such as a rural way, the foreground and the background cannot be separated quickly and accurately; (3) The appearance of the vehicle model is changeable, and the appearance of different vehicle models comprises various parameters, such as color, shape, brand, size and the like, which can influence the recognition of the characteristics of the vehicle. In short, at present, the characteristic recognition of the logistics vehicles by utilizing computer vision still has the influence of uncertainty factors such as environment, scene, appearance and the like, so that the problem of difficult recognition is caused.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a method for positioning the characteristics of a logistics vehicle based on an improved master R-CNN, aiming at the management problem of the logistics engineering vehicle and the problem that the traditional recognition method is difficult to recognize due to uncertainty factors such as environment, scene, appearance and the like.
The method comprises the steps of carrying out data enhancement on the logistics vehicle image to enable the sample image to increase scene diversity; then, constructing a basic network model by using the improved faster-CNN; then, a non-maximum suppression algorithm is introduced to screen a logistics vehicle target boundary box; and finally, unified normalization is carried out on the object characteristics of the logistics vehicle, so that accurate positioning is realized.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method for logistic vehicle feature localization based on improved faster R-CNN, comprising the steps of:
step one, carrying out image enhancement processing on logistics vehicles;
for the problems of fixed shooting angle, single background, lower detection rate and the like of logistics vehicles, the invention introduces a data enhancement means, and the logistics vehicle images are processed through operations such as multi-scale equal-scale scaling, image rotation, enhanced saturation and the like, so that the scene diversity is increased, and the logistics vehicle images are used for further identification and positioning.
1.1 Multi-scale scaling operation is carried out on the logistics vehicle;
the object flow vehicle image is scaled in multiple scales according to the principle of not damaging the specific aspect ratio of the object flow vehicle in the original image, so that the positioning network can learn the object features with specific proportions.
Suppose that the pixel before scaling of a certain flow vehicle is marked as A 0 (x 0 ,y 0 ) The scaled seat is marked as A 1 (x 1 ,y 1 ) Then A 0 And A is a 1 The relation is satisfied:
(x 1 ,y 1 )=(μx 0 ,μy 0 ) (1)
where μ represents a scaling factor. The above equation corresponds to the image scaling matrix, expressed as the following matrix:
wherein when μ > 1, an image enlarging operation is represented; when μ < 1, an image reduction operation is indicated.
1.2 Rotating the logistics vehicle image;
when the camera shoots a logistics vehicle in quick running, the phenomenon that the angle difference of the captured images is extremely large is caused, and in order to adapt to the recognition and positioning of different angles, the captured logistics vehicle images are required to be subjected to rotary transformation, so that vehicle characteristic information of various angles is generated.
The invention sets the center of the logistics vehicle image as the rotation center O (0, 0), the anticlockwise rotation angle is recorded as theta, and when any pixel point P (x, y) in the image is changed into P after rotation conversion 1 (x 1 ,y 1 ) The rotation process is represented by the following equation:
the above formula is a polar transformation formula, which corresponds to the image rotation matrix and is expressed as the following matrix:
1.3 Performing saturation enhancement operation on the logistics vehicle image;
in order to increase the diversity of data samples and enable the feature positioning network to be suitable for complex illumination environments, the saturation of the logistics vehicle image is adjusted.
The specific flow of adjusting the image saturation is as follows:
(S1) calculating a pixel extremum on the logistics vehicle image;
where rgbMax represents a pixel maximum value and rgbMin represents a pixel minimum value.
(S2) saturation calculation;
the saturation S is calculated as follows:
delta=(rgbMax-rgbMin)/255 (6)
value=(rgbMax+rgbMin)/255 (7)
L=value/2 (8)
(S3) adjusting the logistics vehicle image saturation;
a saturation parameter beta is set for adjusting the illumination intensity, and the calculation flow is as follows:
1. if the parameter beta is more than or equal to 0, firstly, obtaining an intermediate variableIs the value of (1):
updatingIs the value of (1):
adjusting the saturation:
RGB'=RGB+(RGB-L*255)*α (12)
2. if the parameter β <0, then:
RGB'=L*255+(RGB-L*255)*(1+α) (13)
the logistics vehicle image subjected to scaling, rotation operation and saturation enhancement is applied to the following steps so as to accurately position logistics vehicle characteristics.
Step two, constructing a basic network model;
although three basic networks are provided by the faster-CNN for feature extraction, in order to obtain a better feature extraction effect, the VGGNet-16 basic network is adopted as the feature extraction network for classifying logistics vehicles of different vehicle types. Meanwhile, in order to realize the positioning of the logistics vehicles, the invention adds an object detection positioning model of the RPN network after the feature extraction module in the third convolution layering of the fifth convolution layer of VGGNet-16.
The detailed design flow of the basic network model constructed by the invention is as follows:
(T1) first stage: firstly, inputting an image with the size of W, H and 3 processed in the step one; then carrying out convolution operation on the logistics vehicle image through two continuous 64-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; the convolved image is then reduced in dimension by a 64-channel max pooling layer with a pooling kernel size of 2 x 2 and a step size of 2. At this stage output a sheetA feature map of size.
(T2) second stage: the flow is the same as the first stage, namely the image obtained in the first stage is input into a second stage network, and then a new characteristic diagram is obtained through convolution and pooling operation. But unlike the first stage, the convolution and pooling channels of the second stage are each changed to 128, and the other parameters are the same as the first stage.
(T3) third stage: firstly, inputting the image output by the second stage into a network of a third stage; then, carrying out convolution operation on the image through three continuous 256-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; the convolved image is then reduced in dimension by a 256-channel max pooling layer with a pooling kernel size of 2 x 2 and a step size of 2.
(T4) fourth stage: the flow is the same as the third stage, namely the image obtained in the third stage is input into a fourth stage network, and then a new characteristic diagram is obtained through convolution and pooling operation. But unlike the third stage, the convolution and pooling channels of the fourth stage are each changed to 512, and the other parameters are the same as the third stage.
(T5) fifth stage: this stage consists of three convolutional layers, each with 512 channels, with a convolution kernel size of 3*3 and a convolution step size of 2. The feature map output at this stage has a size of
(T6) sixth stage: firstly, connecting a convolution layer with a convolution kernel size of 3*3, a convolution step length of 2 and a convolution channel of 512; then connecting a classification loss function and a frame regression loss function, and carrying out regression judgment on frame information and classification information (the probability of most likely being a certain vehicle type) belonging to logistics vehicles or backgrounds;
(T7) seventh stage: firstly, connecting two full connection layers of 4096 channels; then connecting a total loss function; and finally outputting the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
In the above-mentioned basic network structure, the design of parameters such as some activation functions, loss functions and the like is designed, and will be described in detail below:
(P1) in VGGNet-16 basic network, regarding the activation function of the connection after all convolution layers, reLu activation functions are used in the invention:
ReLu(x)=max(0,x) (14)
(P2) in the sixth stage, the classification loss function and the rim regression loss function used are designed as follows:
classification loss L rpn_cls Expressed as:
wherein p is i Representing the probability that the frame is a logistics vehicle or a background;and (5) representing the probability that the real frame corresponding to the frame is a logistics vehicle or a background, if the real frame is a foreground, marking the real frame as 1, and otherwise marking the real frame as 0.
Boundary box regression loss L rpn_box Expressed as:
wherein t is i Four-dimensional position information representing predicted border i, denoted t i (x i ,y i ,w i ,h i );Four-dimensional position information representing a real frame, denoted +.> The function is expressed as follows:
(P3) in the seventh stage, the overall loss function design principle is: the frame regression loss adopts a logistic regression mapping method; the classification loss adopts a gradient descent method; in the process of decreasing the loss function, an Adam gradient decreasing method is used for optimizing the loss function, and corresponding parameters are set to alpha=0.001 and beta 1 =0.9、β 2 =0.999 and ε=10e-8.
(P4) during training, the learning rate adjustment strategy of the present invention employs a multi-stage decay method.
Thirdly, screening a logistics vehicle target by using a non-maximum suppression algorithm;
the number of boundary boxes on the same logistics vehicle obtained after the logistics vehicle image is processed by the basic network model in the second step is large, so that a method is required to be introduced to screen out redundant boundary boxes. The specific operation flow is as follows:
(Q1) four-dimensional position information (x) according to the prediction boundary box i ,y i ,w i ,h i ) The area S of all predicted frames of each vehicle in the logistics vehicle image can be obtained i ;
S i =w i *h i (18)
(Q2) in the sixth stage of the basic network model, determining the border information and the classification information belonging to the logistics vehicle or the background through regression. For each real logistics vehicle, the corresponding bounding boxes are more, the real logistics vehicles are ranked according to the probability from big to small, and one bounding box with the highest probability is screened out;
and (Q3) circularly calculating the area intersection ratio I of the screened bounding box and the rest bounding boxes, if the I is larger than a preset threshold value (the default threshold value is 0.7), determining that the bounding box is overlapped with the bounding box screened in the step (Q2) in a heavy manner, and deleting the bounding box until all the bounding boxes in the step (Q2) are processed.
If and only if the condition is satisfiedWhen the cross ratio I is calculated as follows:
if and only if the condition is satisfiedWhen the cross ratio I is calculated as follows:
the subscripts in the above formulas (19) - (20) and their constraints can be defined byAt the same time mustTaking the constraint in formula (19) as an example, it can be changed into the following form:
if the constraint in formula (19) is changed to the form of formula (21), the subscript in formula (19) must be changed accordingly.
When the constraint conditions in the formula (19) and the formula (20) are satisfied, the calculation formula of the cross ratio I is:
otherwise, i=0, meaning that the two bounding boxes do not intersect, both remain.
In the formulae (19) - (22), max represents the maximum bounding box, and its area is denoted as S max The method comprises the steps of carrying out a first treatment on the surface of the oti represents any of the remaining bounding boxes, the area of which is denoted S oti The method comprises the steps of carrying out a first treatment on the surface of the The area of intersection between two bounding boxes is denoted S ovp ;(x max ,y max ,w max ,h max ) Four-dimensional position information of the maximum screened boundary frame, namely center coordinates, frame width and frame height; (x) oti ,y oti ,w oti ,h oti ) Four-dimensional position information representing any of the remaining bounding boxes.
Step four, unified normalization is carried out on the object characteristics of the logistics vehicles;
in order to solve the problem of unmatched dimensions of subsequent connection layers caused by different edge characteristics of boundary frames after non-maximum suppression, the invention connects a region-of-interest pooling layer after a loss function in a sixth stage, and performs unified normalization on the boundary frames with different edge characteristics. The specific operation flow is as follows:
(M1) quantizing four-dimensional position information of a boundary box on the logistics vehicle image obtained in the step three into integer array coordinates;
(M2) dividing the quantized bounding box average into maximum pooling of 4*4, 2 x 2, 1*1, forming a fixed length data dimension.
And (3) transmitting the obtained feature map of the fixed dimension data to a seventh stage of the basic network model to obtain the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
Preferably, in step 1.1), in order to reduce the cost of model operation, the present invention sets the scaling factor μ=0.84, and stops scaling when the short-side pixels are less than 100 pix.
Preferably, the area overlap ratio I in step (Q3) has a threshold value of 0.7.
The invention has the advantages that:
the invention provides a method for positioning characteristics of a logistics vehicle based on an improved master R-CNN (factory R-CNN) aiming at the management problem of the logistics engineering vehicle and the problem that the traditional recognition method is difficult to recognize due to uncertainty factors such as environment, scene and appearance. Firstly, carrying out data enhancement on a logistics vehicle image to enable a sample image to increase scene diversity; then, constructing a basic network model by using the improved master R-CNN; then, a non-maximum suppression algorithm is introduced to screen a logistics vehicle target boundary box; and finally, unified normalization is carried out on the object characteristics of the logistics vehicle, so that accurate positioning is realized. Therefore, the characteristic positioning performance of the logistics vehicle is superior to that of the traditional vehicle detection method under the conditions of different environments, scenes and the like, the problem of logistics engineering vehicle management in a park can be well solved, and the logistics vehicle detection method has certain practical value and application prospect.
Drawings
FIG. 1 is a schematic view of a physical distribution vehicle image zoom according to the present invention;
FIG. 2 is a schematic diagram of the image rotation of a logistics vehicle in accordance with the present invention;
FIG. 3 is a diagram of the basic network architecture of the present invention;
FIG. 4 is a comparison of the logistic vehicle image before and after processing by using the non-maximum algorithm designed by the invention; wherein, fig. 4a is an image before non-maximum suppression, and fig. 4b is an image after non-maximum suppression;
FIG. 5 is a unified normalization flow chart for target features of the present invention;
fig. 6 is a technical roadmap of the invention.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
In order to overcome the defects in the prior art, the invention provides a method for positioning the characteristics of a logistics vehicle based on an improved master R-CNN, aiming at the management problem of the logistics engineering vehicle and the problem that the traditional recognition method is difficult to recognize due to uncertainty factors such as environment, scene, appearance and the like. Firstly, carrying out data enhancement on a logistics vehicle image to enable a sample image to increase scene diversity; then, constructing a basic network model by using the improved faster-CNN; then, a non-maximum suppression algorithm is introduced to screen a logistics vehicle target boundary box; and finally, unified normalization is carried out on the object characteristics of the logistics vehicle, so that accurate positioning is realized.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method for logistic vehicle feature localization based on improved faster-CNN, comprising the steps of:
step one, carrying out image enhancement processing on logistics vehicles;
for the problems of fixed shooting angle, single background, lower detection rate and the like of logistics vehicles, the invention introduces a data enhancement means, and the logistics vehicle images are processed through operations such as multi-scale equal-scale scaling, image rotation, enhanced saturation and the like, so that the scene diversity is increased, and the logistics vehicle images are used for further identification and positioning.
1.1 Multi-scale scaling operation is carried out on the logistics vehicle;
the object flow vehicle image is scaled in multiple scales according to the principle of not damaging the specific aspect ratio of the object flow vehicle in the original image, so that the positioning network can learn the object features with specific proportions.
Suppose that the pixel before scaling of a certain flow vehicle is marked as A 0 (x 0 ,y 0 ) The scaled seat is marked as A 1 (x 1 ,y 1 ) Then A 0 And A is a 1 The relation is satisfied:
(x 1 ,y 1 )=(μx 0 ,μy 0 ) (1)
where μ represents a scaling factor. The above equation corresponds to the image scaling matrix, expressed as the following matrix:
wherein when μ > 1, an image enlarging operation is represented; when μ < 1, an image reduction operation is indicated. In order to reduce the cost of model operation, the present invention sets the scaling factor μ=0.84, and stops scaling when the short-side pixels are less than 100 pix.
1.2 Rotating the logistics vehicle image;
when the camera shoots a logistics vehicle in quick running, the phenomenon that the angle difference of the captured images is extremely large is caused, and in order to adapt to the recognition and positioning of different angles, the captured logistics vehicle images are required to be subjected to rotary transformation, so that vehicle characteristic information of various angles is generated.
The invention sets the center of the logistics vehicle image as the rotation center O (0, 0), the anticlockwise rotation angle is recorded as theta, and when any pixel point P (x, y) in the image is changed into P after rotation conversion 1 (x 1 ,y 1 ) The rotation process is represented by the following equation:
the above formula is a polar transformation formula, which corresponds to the image rotation matrix and is expressed as the following matrix:
1.3 Performing saturation enhancement operation on the logistics vehicle image;
in order to increase the diversity of data samples and enable the feature positioning network to be suitable for complex illumination environments, the saturation of the logistics vehicle image is adjusted.
The specific flow of adjusting the image saturation is as follows:
(S1) calculating a pixel extremum on the logistics vehicle image;
where rgbMax represents a pixel maximum value and rgbMin represents a pixel minimum value.
(S2) saturation calculation;
the saturation S is calculated as follows:
delta=(rgbMax-rgbMin)/255 (6)
value=(rgbMax+rgbMin)/255 (7)
L=value/2 (8)
(S3) adjusting the logistics vehicle image saturation;
a saturation parameter beta is set for adjusting the illumination intensity, and the calculation flow is as follows:
1. if the parameter beta is more than or equal to 0, firstly, obtaining an intermediate variableIs the value of (1):
updatingIs the value of (1):
adjusting the saturation:
RGB'=RGB+(RGB-L*255)*α (12)
2. if the parameter β <0, then:
RGB'=L*255+(RGB-L*255)*(1+α) (13)
the logistics vehicle image subjected to scaling, rotation operation and saturation enhancement is applied to the following steps so as to accurately position logistics vehicle characteristics.
Step two, constructing a basic network model;
although three basic networks are provided by the faster-CNN for feature extraction, in order to obtain a better feature extraction effect, the VGGNet-16 basic network is adopted as the feature extraction network for classifying logistics vehicles of different vehicle types. Meanwhile, in order to realize the positioning of the logistics vehicles, the invention adds an object detection positioning model of the RPN network after the feature extraction module in the third convolution layering of the fifth convolution layer of VGGNet-16.
The detailed design flow of the basic network model constructed by the invention is as follows:
(T1) first stage: firstly, inputting an image with the size of W, H and 3 processed in the step one; then carrying out convolution operation on the logistics vehicle image through two continuous 64-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; the convolved image is then reduced in dimension by a 64-channel max pooling layer with a pooling kernel size of 2 x 2 and a step size of 2. At this stage output a sheetA feature map of size.
(T2) second stage: the flow is the same as the first stage, namely the image obtained in the first stage is input into a second stage network, and then a new characteristic diagram is obtained through convolution and pooling operation. But unlike the first stage, the convolution and pooling channels of the second stage are each changed to 128, and the other parameters are the same as the first stage.
(T3) third stage: firstly, inputting the image output by the second stage into a network of a third stage; then, carrying out convolution operation on the image through three continuous 256-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; the convolved image is then reduced in dimension by a 256-channel max pooling layer with a pooling kernel size of 2 x 2 and a step size of 2.
(T4) fourth stage: the flow is the same as the third stage, namely the image obtained in the third stage is input into a fourth stage network, and then a new characteristic diagram is obtained through convolution and pooling operation. But unlike the third stage, the convolution and pooling channels of the fourth stage are each changed to 512, and the other parameters are the same as the third stage.
(T5) fifth stage: this stage consists of three convolutional layers, each with 512 channels, with a convolution kernel size of 3*3 and a convolution step size of 2. The feature map output at this stage has a size of
(T6) sixth stage: firstly, connecting a convolution layer with a convolution kernel size of 3*3, a convolution step length of 2 and a convolution channel of 512; then connecting a classification loss function and a frame regression loss function, and carrying out regression judgment on frame information and classification information (the probability of most likely being a certain vehicle type) belonging to logistics vehicles or backgrounds;
(T7) seventh stage: firstly, connecting two full connection layers of 4096 channels; then connecting a total loss function; and finally outputting the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
In the above-mentioned basic network structure, the design of parameters such as some activation functions, loss functions and the like is designed, and will be described in detail below:
(P1) in VGGNet-16 basic network, regarding the activation function of the connection after all convolution layers, reLu activation functions are used in the invention:
ReLu(x)=max(0,x) (14)
(P2) in the sixth stage, the classification loss function and the rim regression loss function used are designed as follows:
classification loss L rpn_cls Expressed as:
wherein p is i Representing the probability that the frame is a logistics vehicle or a background;and (5) representing the probability that the real frame corresponding to the frame is a logistics vehicle or a background, if the real frame is a foreground, marking the real frame as 1, and otherwise marking the real frame as 0.
Boundary box regression loss L rpn_box Expressed as:
wherein t is i Four-dimensional position information representing predicted border i, denoted t i (x i ,y i ,w i ,h i );Four-dimensional position information representing a real frame, denoted +.> The function is expressed as follows:
(P3) in the seventh stage, the overall loss function design principle is: the frame regression loss adopts a logistic regression mapping method; the classification loss adopts a gradient descent method; in the process of decreasing the loss function, an Adam gradient decreasing method is used for optimizing the loss function, and corresponding parameters are set to alpha=0.001 and beta 1 =0.9、β 2 =0.999 and ε=10e-8.
(P4) during training, the learning rate adjustment strategy of the present invention employs a multi-stage decay method.
Thirdly, screening a logistics vehicle target by using a non-maximum suppression algorithm;
the number of boundary boxes on the same logistics vehicle obtained after the logistics vehicle image is processed by the basic network model in the second step is large, so that a method is required to be introduced to screen out redundant boundary boxes. The specific operation flow is as follows:
(Q1) four-dimensional position information (x) according to the prediction boundary box i ,y i ,w i ,h i ) The area S of all predicted frames of each vehicle in the logistics vehicle image can be obtained i ;
S i =w i *h i (18)
(Q2) in the sixth stage of the basic network model, determining the border information and the classification information belonging to the logistics vehicle or the background through regression. For each real logistics vehicle, the corresponding bounding boxes are more, the real logistics vehicles are ranked according to the probability from big to small, and one bounding box with the highest probability is screened out;
and (Q3) circularly calculating the area intersection ratio I of the screened bounding box and the rest bounding boxes, if the I is larger than a preset threshold value (the default threshold value is 0.7), determining that the bounding box is overlapped with the bounding box screened in the step (Q2) in a heavy manner, and deleting the bounding box until all the bounding boxes in the step (Q2) are processed.
If and only if the condition is satisfiedWhen the cross ratio I is calculated as follows:
if and only if the condition is satisfiedWhen the cross ratio I is calculated as follows:
the subscripts in the above formulae (19) - (20) and their constraints can be defined byAt the same time must->Taking the constraint in formula (19) as an example, it can be changed into the following form:
if the constraint in formula (19) is changed to the form of formula (21), the subscript in formula (19) must be changed accordingly.
When the constraint conditions in the formula (19) and the formula (20) are satisfied, the calculation formula of the cross ratio I is:
otherwise, i=0, meaning that the two bounding boxes do not intersect, both remain.
In the formulae (19) - (22), max represents the maximum bounding box, and its area is denoted as S max The method comprises the steps of carrying out a first treatment on the surface of the oti represents any of the remaining bounding boxes, the area of which is denoted S oti The method comprises the steps of carrying out a first treatment on the surface of the The area of intersection between two bounding boxes is denoted S ovp ;(x max ,y max ,w max ,h max ) Four-dimensional position information of the maximum screened boundary frame, namely center coordinates, frame width and frame height; (x) oti ,y oti ,w oti ,h oti ) Four-dimensional position information representing any of the remaining bounding boxes.
Step four, unified normalization is carried out on the object characteristics of the logistics vehicles;
in order to solve the problem of unmatched dimensions of subsequent connection layers caused by different edge characteristics of boundary frames after non-maximum suppression, the invention connects a region-of-interest pooling layer after a loss function in a sixth stage, and performs unified normalization on the boundary frames with different edge characteristics. The specific operation flow is as follows:
(M1) quantizing four-dimensional position information of a boundary box on the logistics vehicle image obtained in the step three into integer array coordinates;
(M2) dividing the quantized bounding box average into maximum pooling of 4*4, 2 x 2, 1*1, forming a fixed length data dimension.
And (3) transmitting the obtained feature map of the fixed dimension data to a seventh stage of the basic network model to obtain the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
The embodiments described in the present specification are merely examples of implementation forms of the inventive concept, and the scope of protection of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, and the scope of protection of the present invention and equivalent technical means that can be conceived by those skilled in the art based on the inventive concept.
Claims (3)
1. A method for logistic vehicle feature localization based on improved faster-CNN, comprising the steps of:
step one, carrying out image enhancement processing on logistics vehicles;
introducing a data enhancement means, and processing the logistics vehicle image through operations of multi-scale equal-scale scaling, image rotation and saturation enhancement, so as to increase scene diversity of the logistics vehicle image for further identification and positioning;
1.1 Multi-scale scaling operation is carried out on the logistics vehicle;
the method has the advantages that the principle that the specific length-width ratio of the logistics vehicle in the original image is not destroyed is adopted, the logistics vehicle image is scaled in multiple scales, and the positioning network can learn target features in specific proportions;
suppose that the pixel before scaling of a certain flow vehicle is marked as A 0 (x 0 ,y 0 ) The scaled seat is marked as A 1 (x 1 ,y 1 ) Then A 0 And A is a 1 The relation is satisfied:
(x 1 ,y 1 )=(μx 0 ,μy 0 ) (1)
wherein μ represents a scaling factor; the above equation corresponds to the image scaling matrix, expressed as the following matrix:
wherein when μ > 1, an image enlarging operation is represented; when μ < 1, an image reduction operation is represented;
1.2 Rotating the logistics vehicle image;
when a camera shoots a logistics vehicle in quick running, the phenomenon that the angle difference of the captured images is extremely large is caused, and in order to adapt to the identification and positioning of different angles, the captured logistics vehicle images are required to be subjected to rotary transformation so as to generate vehicle characteristic information of various angles;
setting the center of the logistics vehicle image as a rotation center O (0, 0), recording the anticlockwise rotation angle as theta, and changing any pixel point P (x, y) in the image into P after rotation conversion 1 (x 1 ,y 1 ) The rotation process is represented by the following equation:
the above formula is a polar transformation formula, which corresponds to the image rotation matrix and is expressed as the following matrix:
1.3 Performing saturation enhancement operation on the logistics vehicle image;
in order to increase the diversity of data samples, the characteristic positioning network can be suitable for a complex illumination environment, and the saturation of the logistics vehicle image is adjusted;
the specific flow of adjusting the image saturation is as follows:
(S1) calculating a pixel extremum on the logistics vehicle image;
wherein rgbMax represents a maximum value of the pixel, and rgbMin represents a minimum value of the pixel;
(S2) saturation calculation;
the saturation S is calculated as follows:
delta=(rgbMax-rgbMin)255 (6)
value=(rgbMax+rgbMin)/255 (7)
L=value/2 (8)
(S3) adjusting the logistics vehicle image saturation;
a saturation parameter beta is set for adjusting the illumination intensity, and the calculation flow is as follows:
1. if the parameter beta is more than or equal to 0, firstly, obtaining an intermediate variableIs the value of (1):
updatingIs the value of (1):
adjusting the saturation:
2. if the parameter β <0, then:
the logistics vehicle image subjected to scaling treatment, rotation operation and saturation enhancement is applied to the following steps so as to accurately position the logistics vehicle characteristics;
step two, constructing a basic network model;
the VGGNet-16 basic network is used as a feature extraction network for classifying logistics vehicles of different vehicle types; meanwhile, in order to realize the positioning of the logistics vehicles, a target detection positioning model of an RPN network is added behind a feature extraction module in a third convolution layering of a fifth convolution layer of VGGNet-16;
the steps of constructing the basic network model are as follows:
(T1) first stage: firstly, inputting an image with the size of W, H and 3 processed in the step one; then carrying out convolution operation on the logistics vehicle image through two continuous 64-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; then, the convolved image is subjected to dimension reduction through a maximum pooling layer of 64 channels, wherein the pooling kernel size is 2 x 2, and the step length is 2; at this stage output a sheetA feature map of size;
(T2) second stage: the flow is the same as the first stage, namely, the image obtained in the first stage is input into a second stage network, and then a new characteristic diagram is obtained through convolution and pooling operation; but the difference from the first stage is that the convolution and pooling channels of the second stage are both 128, and other parameters are the same as those of the first stage;
(T3) third stage: firstly, inputting the image output by the second stage into a network of a third stage; then, carrying out convolution operation on the image through three continuous 256-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; then, the convolved image is subjected to dimension reduction through a maximum pooling layer of 256 channels, wherein the pooling kernel size is 2 x 2, and the step length is 2;
(T4) fourth stage: the flow is the same as the third stage, namely, the image obtained in the third stage is input into a fourth stage network, and then a new feature map is obtained through convolution and pooling operation; but the difference from the third stage is that the convolution and pooling channels of the fourth stage are changed to 512, and other parameters are the same as those of the third stage;
(T5) fifth stage: the phase consists of three convolution layers, each convolution layer has 512 channels, the convolution kernel size is 3*3, and the convolution step length is 2; the feature map output at this stage has a size of(T6) sixth stage: firstly, connecting a convolution layer with a convolution kernel size of 3*3, a convolution step length of 2 and a convolution channel of 512; then connecting a classification loss function and a frame regression loss function, and carrying out regression judgment on frame information and classification information belonging to logistics vehicles or backgrounds, wherein the classification information is the probability that the logistics vehicles displayed by the images are most likely to be of a certain vehicle type;
(T7) seventh stage: firstly, connecting two full connection layers of 4096 channels; then connecting a total loss function; finally, outputting an accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type;
in the above basic network structure, the design of parameters related to the activation function and the loss function specifically includes:
(P1) in VGGNet-16 base network, the ReLu activation function is used for all the activation functions of the convolutional layer post-connection:
ReLu(x)=max(0,x) (14)
(P2) in the sixth stage, using the classification loss function and the bounding box regression loss function:
classification loss L rpn_cls Expressed as:
wherein p is i Representing the probability that the frame is a logistics vehicle or a background;representing the probability that the real frame corresponding to the frame is a logistics vehicle or a background, if the real frame is a foreground, marking the real frame as 1, otherwise marking the real frame as 0;
boundary box regression loss L rpn_box Expressed as:
wherein t is i Four-dimensional position information representing predicted border i, denoted t i (x i ,y i ,w i ,h i );Four-dimensional position information representing a real frame, denoted +.> The function is expressed as follows:
(P3) in the seventh stage, the overall loss function design principle is: the frame regression loss adopts a logistic regression mapping method; the classification loss adopts a gradient descent method; in the process of decreasing the loss function, an Adam gradient decreasing method is used for optimizing the loss function, and the corresponding parameter is set to alpha=0.001,β 1 =0.9、β 2 =0.999 and epsilon=10e-8;
(P4) in the training process, the learning rate adjustment strategy adopts a multi-stage attenuation method;
thirdly, screening a logistics vehicle target by using a non-maximum suppression algorithm;
the number of the boundary boxes on the same logistics vehicle obtained after the logistics vehicle image is processed by the basic network model in the second step is multiple, so that a method is required to be introduced to screen out redundant boundary boxes; the specific operation flow is as follows:
(Q1) four-dimensional position information (x) according to the prediction boundary box i ,y i ,w i ,h i ) The area S of all predicted frames of each vehicle in the logistics vehicle image can be obtained i ;
S i =w i *h i (18)
(Q2) in a sixth stage of the basic network model, determining frame information and classification information belonging to the logistics vehicle or the background through regression; for each real logistics vehicle, a plurality of corresponding bounding boxes are arranged, the bounding boxes are ordered according to the probability from big to small, and one bounding box with the highest probability is screened out;
(Q3) circularly calculating the area intersection ratio I of the screened bounding box and the rest bounding boxes, if the I is larger than a preset threshold value, determining that the bounding box is seriously overlapped with the bounding box screened in the step (Q2), and deleting the bounding box until all the bounding boxes in the step (Q2) are processed;
if and only if the condition is satisfied
When the cross ratio I is calculated as follows:
if and only if the condition is satisfied
When the cross ratio I is calculated as follows:
the subscripts in the above formulae (19) - (20) and their constraints can be defined byAt the same time must->Taking the constraint in formula (19) as an example, it becomes the following form:
if the constraint in formula (19) changes to the form of formula (21), then the subscript in formula (19) must also be changed accordingly;
when the constraint conditions in the formula (19) and the formula (20) are satisfied, the calculation formula of the cross ratio I is:
otherwise, i=0, meaning that the two bounding boxes do not intersect, both remain;
in the formulae (19) - (22), max represents the maximum bounding box, and its area is denoted as S max The method comprises the steps of carrying out a first treatment on the surface of the oti represents any of the remaining bounding boxes, the area of which is denoted S oti The method comprises the steps of carrying out a first treatment on the surface of the The area of intersection between two bounding boxes is denoted S ovp ;
(x max ,y max ,w max ,h max ) Representing the largest bounding box screenedFour-dimensional position information, namely center coordinates, frame width and frame height; (x) oti ,y oti ,w oti ,h oti ) Four-dimensional position information representing any one of the remaining bounding boxes;
step four, unified normalization is carried out on the object characteristics of the logistics vehicles;
in order to solve the problem of unmatched dimensions of subsequent connecting layers caused by different edge characteristics of the boundary frames after non-maximum suppression, after a loss function in a sixth stage, a region-of-interest pooling layer is connected, and unified normalization is carried out on the boundary frames with different edge characteristics; the specific operation flow is as follows:
(M1) quantizing four-dimensional position information of a boundary box on the logistics vehicle image obtained in the step three into integer array coordinates;
(M2) dividing the quantized bounding box into a maximum pooling of 4*4, 2 x 2, 1*1 on average, forming a fixed length data dimension;
and (3) transmitting the obtained feature map of the fixed dimension data to a seventh stage of the basic network model to obtain the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
2. A method of improving the positioning of a feature of a faster R-CNN based logistics vehicle as set forth in claim 1, wherein: the threshold of the area intersection ratio I in the step (Q3) is 0.7.
3. A method of improving the positioning of a feature of a faster R-CNN based logistics vehicle as set forth in claim 1, wherein: the scaling factor μ=0.84 is set in step 1.1), and scaling is stopped when the short-side pixel is less than 100 pix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010690178.9A CN111986080B (en) | 2020-07-17 | 2020-07-17 | Logistics vehicle feature positioning method based on improved master R-CNN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010690178.9A CN111986080B (en) | 2020-07-17 | 2020-07-17 | Logistics vehicle feature positioning method based on improved master R-CNN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111986080A CN111986080A (en) | 2020-11-24 |
CN111986080B true CN111986080B (en) | 2024-01-16 |
Family
ID=73438739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010690178.9A Active CN111986080B (en) | 2020-07-17 | 2020-07-17 | Logistics vehicle feature positioning method based on improved master R-CNN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111986080B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832794A (en) * | 2017-11-09 | 2018-03-23 | 车智互联(北京)科技有限公司 | A kind of convolutional neural networks generation method, the recognition methods of car system and computing device |
CN109034024A (en) * | 2018-07-16 | 2018-12-18 | 浙江工业大学 | Logistics vehicles vehicle classification recognition methods based on image object detection |
CN110175524A (en) * | 2019-04-26 | 2019-08-27 | 南京航空航天大学 | A kind of quick vehicle checking method of accurately taking photo by plane based on lightweight depth convolutional network |
-
2020
- 2020-07-17 CN CN202010690178.9A patent/CN111986080B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832794A (en) * | 2017-11-09 | 2018-03-23 | 车智互联(北京)科技有限公司 | A kind of convolutional neural networks generation method, the recognition methods of car system and computing device |
CN109034024A (en) * | 2018-07-16 | 2018-12-18 | 浙江工业大学 | Logistics vehicles vehicle classification recognition methods based on image object detection |
CN110175524A (en) * | 2019-04-26 | 2019-08-27 | 南京航空航天大学 | A kind of quick vehicle checking method of accurately taking photo by plane based on lightweight depth convolutional network |
Also Published As
Publication number | Publication date |
---|---|
CN111986080A (en) | 2020-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740465B (en) | Lane line detection algorithm based on example segmentation neural network framework | |
CN109978807B (en) | Shadow removing method based on generating type countermeasure network | |
CN106971185B (en) | License plate positioning method and device based on full convolution network | |
CN113627228B (en) | Lane line detection method based on key point regression and multi-scale feature fusion | |
CN110796009A (en) | Method and system for detecting marine vessel based on multi-scale convolution neural network model | |
CN110866879B (en) | Image rain removing method based on multi-density rain print perception | |
Tang et al. | Single image dehazing via lightweight multi-scale networks | |
CN107564009B (en) | Outdoor scene multi-target segmentation method based on deep convolutional neural network | |
CN110706239B (en) | Scene segmentation method fusing full convolution neural network and improved ASPP module | |
CN111915530A (en) | End-to-end-based haze concentration self-adaptive neural network image defogging method | |
CN111539343B (en) | Black smoke vehicle detection method based on convolution attention network | |
CN109272060B (en) | Method and system for target detection based on improved darknet neural network | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
CN107644221A (en) | Convolutional neural networks traffic sign recognition method based on compression of parameters | |
CN111915525A (en) | Low-illumination image enhancement method based on improved depth separable generation countermeasure network | |
CN113298810A (en) | Trace detection method combining image enhancement and depth convolution neural network | |
CN114140672A (en) | Target detection network system and method applied to multi-sensor data fusion in rainy and snowy weather scene | |
CN115661777A (en) | Semantic-combined foggy road target detection algorithm | |
CN114627269A (en) | Virtual reality security protection monitoring platform based on degree of depth learning target detection | |
CN112164010A (en) | Multi-scale fusion convolution neural network image defogging method | |
CN116342536A (en) | Aluminum strip surface defect detection method, system and equipment based on lightweight model | |
CN115019340A (en) | Night pedestrian detection algorithm based on deep learning | |
CN114782919A (en) | Road grid map construction method and system with real and simulation data enhanced | |
CN113902965A (en) | Multi-spectral pedestrian detection method based on multi-layer feature fusion | |
CN110033045A (en) | A kind of method and apparatus of trained identification image atomization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |