CN111986080B - Logistics vehicle feature positioning method based on improved master R-CNN - Google Patents

Logistics vehicle feature positioning method based on improved master R-CNN Download PDF

Info

Publication number
CN111986080B
CN111986080B CN202010690178.9A CN202010690178A CN111986080B CN 111986080 B CN111986080 B CN 111986080B CN 202010690178 A CN202010690178 A CN 202010690178A CN 111986080 B CN111986080 B CN 111986080B
Authority
CN
China
Prior art keywords
stage
image
logistics
logistics vehicle
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010690178.9A
Other languages
Chinese (zh)
Other versions
CN111986080A (en
Inventor
张烨
樊一超
陈威慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202010690178.9A priority Critical patent/CN111986080B/en
Publication of CN111986080A publication Critical patent/CN111986080A/en
Application granted granted Critical
Publication of CN111986080B publication Critical patent/CN111986080B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4084Scaling of whole images or parts thereof, e.g. expanding or contracting in the transform domain, e.g. fast Fourier transform [FFT] domain scaling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A method for improved faster R-CNN based logistic vehicle feature localization, comprising: step one, carrying out image enhancement processing on logistics vehicles; processing the logistics vehicle image by introducing a data enhancement means; step two, constructing a basic network model; adopting VGGNet-16 basic network as characteristic extraction network; meanwhile, in order to realize the positioning of the logistics vehicles, a target detection positioning model of an RPN network is added behind a feature extraction module in a third convolution layering of a fifth convolution layer of VGGNet-16; thirdly, screening a logistics vehicle target by using a non-maximum suppression algorithm; step four, unified normalization is carried out on the object characteristics of the logistics vehicles; and (3) transmitting the obtained feature map of the fixed dimension data to a seventh stage of the basic network model to obtain the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type. The invention has good characteristic positioning performance on logistics vehicles in different environments and scenes.

Description

Logistics vehicle feature positioning method based on improved master R-CNN
Technical Field
The invention relates to a logistic vehicle characteristic positioning method based on improved faster R-CNN.
Technical Field
In recent years, with the development of traffic logistics, more and more logistics vehicles serve the work and the life of people, but the problem is caused by the fact that too many logistics engineering vehicles lead to the increase of the difficulty coefficient of vehicle parking management in a park. Although operations such as pulling and throwing of the logistics vehicle can improve the running efficiency of cargo loading, the problems that the logistics vehicle occupies a parking space unreasonably, the pulling and throwing cannot charge accurately and the like exist at present, and more serious is that some car owners have extremely dangerous behaviors such as fake license and the like in order to avoid monitoring detection.
In order to effectively solve the management problem of logistics engineering vehicles, a plurality of examples of identifying logistics vehicles of different vehicle types by adopting technical means such as computer vision and the like exist today, and the identification method is mostly to obtain images of the vehicle types from a traffic intersection camera or an image acquisition card, and because the images acquired by the traffic videos are the positions of the vehicles in a certain position in the natural environment, namely, the accurate positions of the vehicles in the images are found, and then the characteristic extraction operation of the vehicles is carried out on the vehicles, so that the identification of the vehicle types is achieved. However, the current recognition method mainly has the following difficulties in vehicle type recognition: (1) The recognition effect on the vehicle type is greatly influenced under different illumination conditions, and the situation of wrong recognition can be caused by different visual experiences of the same vehicle in sunny days, rainy days, snowy days and other environments; (2) The scene where the vehicle is located is complex and changeable, for example, in the scene with complex background such as a rural way, the foreground and the background cannot be separated quickly and accurately; (3) The appearance of the vehicle model is changeable, and the appearance of different vehicle models comprises various parameters, such as color, shape, brand, size and the like, which can influence the recognition of the characteristics of the vehicle. In short, at present, the characteristic recognition of the logistics vehicles by utilizing computer vision still has the influence of uncertainty factors such as environment, scene, appearance and the like, so that the problem of difficult recognition is caused.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a method for positioning the characteristics of a logistics vehicle based on an improved master R-CNN, aiming at the management problem of the logistics engineering vehicle and the problem that the traditional recognition method is difficult to recognize due to uncertainty factors such as environment, scene, appearance and the like.
The method comprises the steps of carrying out data enhancement on the logistics vehicle image to enable the sample image to increase scene diversity; then, constructing a basic network model by using the improved faster-CNN; then, a non-maximum suppression algorithm is introduced to screen a logistics vehicle target boundary box; and finally, unified normalization is carried out on the object characteristics of the logistics vehicle, so that accurate positioning is realized.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method for logistic vehicle feature localization based on improved faster R-CNN, comprising the steps of:
step one, carrying out image enhancement processing on logistics vehicles;
for the problems of fixed shooting angle, single background, lower detection rate and the like of logistics vehicles, the invention introduces a data enhancement means, and the logistics vehicle images are processed through operations such as multi-scale equal-scale scaling, image rotation, enhanced saturation and the like, so that the scene diversity is increased, and the logistics vehicle images are used for further identification and positioning.
1.1 Multi-scale scaling operation is carried out on the logistics vehicle;
the object flow vehicle image is scaled in multiple scales according to the principle of not damaging the specific aspect ratio of the object flow vehicle in the original image, so that the positioning network can learn the object features with specific proportions.
Suppose that the pixel before scaling of a certain flow vehicle is marked as A 0 (x 0 ,y 0 ) The scaled seat is marked as A 1 (x 1 ,y 1 ) Then A 0 And A is a 1 The relation is satisfied:
(x 1 ,y 1 )=(μx 0 ,μy 0 ) (1)
where μ represents a scaling factor. The above equation corresponds to the image scaling matrix, expressed as the following matrix:
wherein when μ > 1, an image enlarging operation is represented; when μ < 1, an image reduction operation is indicated.
1.2 Rotating the logistics vehicle image;
when the camera shoots a logistics vehicle in quick running, the phenomenon that the angle difference of the captured images is extremely large is caused, and in order to adapt to the recognition and positioning of different angles, the captured logistics vehicle images are required to be subjected to rotary transformation, so that vehicle characteristic information of various angles is generated.
The invention sets the center of the logistics vehicle image as the rotation center O (0, 0), the anticlockwise rotation angle is recorded as theta, and when any pixel point P (x, y) in the image is changed into P after rotation conversion 1 (x 1 ,y 1 ) The rotation process is represented by the following equation:
the above formula is a polar transformation formula, which corresponds to the image rotation matrix and is expressed as the following matrix:
1.3 Performing saturation enhancement operation on the logistics vehicle image;
in order to increase the diversity of data samples and enable the feature positioning network to be suitable for complex illumination environments, the saturation of the logistics vehicle image is adjusted.
The specific flow of adjusting the image saturation is as follows:
(S1) calculating a pixel extremum on the logistics vehicle image;
where rgbMax represents a pixel maximum value and rgbMin represents a pixel minimum value.
(S2) saturation calculation;
the saturation S is calculated as follows:
delta=(rgbMax-rgbMin)/255 (6)
value=(rgbMax+rgbMin)/255 (7)
L=value/2 (8)
(S3) adjusting the logistics vehicle image saturation;
a saturation parameter beta is set for adjusting the illumination intensity, and the calculation flow is as follows:
1. if the parameter beta is more than or equal to 0, firstly, obtaining an intermediate variableIs the value of (1):
updatingIs the value of (1):
adjusting the saturation:
RGB'=RGB+(RGB-L*255)*α (12)
2. if the parameter β <0, then:
RGB'=L*255+(RGB-L*255)*(1+α) (13)
the logistics vehicle image subjected to scaling, rotation operation and saturation enhancement is applied to the following steps so as to accurately position logistics vehicle characteristics.
Step two, constructing a basic network model;
although three basic networks are provided by the faster-CNN for feature extraction, in order to obtain a better feature extraction effect, the VGGNet-16 basic network is adopted as the feature extraction network for classifying logistics vehicles of different vehicle types. Meanwhile, in order to realize the positioning of the logistics vehicles, the invention adds an object detection positioning model of the RPN network after the feature extraction module in the third convolution layering of the fifth convolution layer of VGGNet-16.
The detailed design flow of the basic network model constructed by the invention is as follows:
(T1) first stage: firstly, inputting an image with the size of W, H and 3 processed in the step one; then carrying out convolution operation on the logistics vehicle image through two continuous 64-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; the convolved image is then reduced in dimension by a 64-channel max pooling layer with a pooling kernel size of 2 x 2 and a step size of 2. At this stage output a sheetA feature map of size.
(T2) second stage: the flow is the same as the first stage, namely the image obtained in the first stage is input into a second stage network, and then a new characteristic diagram is obtained through convolution and pooling operation. But unlike the first stage, the convolution and pooling channels of the second stage are each changed to 128, and the other parameters are the same as the first stage.
(T3) third stage: firstly, inputting the image output by the second stage into a network of a third stage; then, carrying out convolution operation on the image through three continuous 256-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; the convolved image is then reduced in dimension by a 256-channel max pooling layer with a pooling kernel size of 2 x 2 and a step size of 2.
(T4) fourth stage: the flow is the same as the third stage, namely the image obtained in the third stage is input into a fourth stage network, and then a new characteristic diagram is obtained through convolution and pooling operation. But unlike the third stage, the convolution and pooling channels of the fourth stage are each changed to 512, and the other parameters are the same as the third stage.
(T5) fifth stage: this stage consists of three convolutional layers, each with 512 channels, with a convolution kernel size of 3*3 and a convolution step size of 2. The feature map output at this stage has a size of
(T6) sixth stage: firstly, connecting a convolution layer with a convolution kernel size of 3*3, a convolution step length of 2 and a convolution channel of 512; then connecting a classification loss function and a frame regression loss function, and carrying out regression judgment on frame information and classification information (the probability of most likely being a certain vehicle type) belonging to logistics vehicles or backgrounds;
(T7) seventh stage: firstly, connecting two full connection layers of 4096 channels; then connecting a total loss function; and finally outputting the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
In the above-mentioned basic network structure, the design of parameters such as some activation functions, loss functions and the like is designed, and will be described in detail below:
(P1) in VGGNet-16 basic network, regarding the activation function of the connection after all convolution layers, reLu activation functions are used in the invention:
ReLu(x)=max(0,x) (14)
(P2) in the sixth stage, the classification loss function and the rim regression loss function used are designed as follows:
classification loss L rpn_cls Expressed as:
wherein p is i Representing the probability that the frame is a logistics vehicle or a background;and (5) representing the probability that the real frame corresponding to the frame is a logistics vehicle or a background, if the real frame is a foreground, marking the real frame as 1, and otherwise marking the real frame as 0.
Boundary box regression loss L rpn_box Expressed as:
wherein t is i Four-dimensional position information representing predicted border i, denoted t i (x i ,y i ,w i ,h i );Four-dimensional position information representing a real frame, denoted +.> The function is expressed as follows:
(P3) in the seventh stage, the overall loss function design principle is: the frame regression loss adopts a logistic regression mapping method; the classification loss adopts a gradient descent method; in the process of decreasing the loss function, an Adam gradient decreasing method is used for optimizing the loss function, and corresponding parameters are set to alpha=0.001 and beta 1 =0.9、β 2 =0.999 and ε=10e-8.
(P4) during training, the learning rate adjustment strategy of the present invention employs a multi-stage decay method.
Thirdly, screening a logistics vehicle target by using a non-maximum suppression algorithm;
the number of boundary boxes on the same logistics vehicle obtained after the logistics vehicle image is processed by the basic network model in the second step is large, so that a method is required to be introduced to screen out redundant boundary boxes. The specific operation flow is as follows:
(Q1) four-dimensional position information (x) according to the prediction boundary box i ,y i ,w i ,h i ) The area S of all predicted frames of each vehicle in the logistics vehicle image can be obtained i
S i =w i *h i (18)
(Q2) in the sixth stage of the basic network model, determining the border information and the classification information belonging to the logistics vehicle or the background through regression. For each real logistics vehicle, the corresponding bounding boxes are more, the real logistics vehicles are ranked according to the probability from big to small, and one bounding box with the highest probability is screened out;
and (Q3) circularly calculating the area intersection ratio I of the screened bounding box and the rest bounding boxes, if the I is larger than a preset threshold value (the default threshold value is 0.7), determining that the bounding box is overlapped with the bounding box screened in the step (Q2) in a heavy manner, and deleting the bounding box until all the bounding boxes in the step (Q2) are processed.
If and only if the condition is satisfiedWhen the cross ratio I is calculated as follows:
if and only if the condition is satisfiedWhen the cross ratio I is calculated as follows:
the subscripts in the above formulas (19) - (20) and their constraints can be defined byAt the same time mustTaking the constraint in formula (19) as an example, it can be changed into the following form:
if the constraint in formula (19) is changed to the form of formula (21), the subscript in formula (19) must be changed accordingly.
When the constraint conditions in the formula (19) and the formula (20) are satisfied, the calculation formula of the cross ratio I is:
otherwise, i=0, meaning that the two bounding boxes do not intersect, both remain.
In the formulae (19) - (22), max represents the maximum bounding box, and its area is denoted as S max The method comprises the steps of carrying out a first treatment on the surface of the oti represents any of the remaining bounding boxes, the area of which is denoted S oti The method comprises the steps of carrying out a first treatment on the surface of the The area of intersection between two bounding boxes is denoted S ovp ;(x max ,y max ,w max ,h max ) Four-dimensional position information of the maximum screened boundary frame, namely center coordinates, frame width and frame height; (x) oti ,y oti ,w oti ,h oti ) Four-dimensional position information representing any of the remaining bounding boxes.
Step four, unified normalization is carried out on the object characteristics of the logistics vehicles;
in order to solve the problem of unmatched dimensions of subsequent connection layers caused by different edge characteristics of boundary frames after non-maximum suppression, the invention connects a region-of-interest pooling layer after a loss function in a sixth stage, and performs unified normalization on the boundary frames with different edge characteristics. The specific operation flow is as follows:
(M1) quantizing four-dimensional position information of a boundary box on the logistics vehicle image obtained in the step three into integer array coordinates;
(M2) dividing the quantized bounding box average into maximum pooling of 4*4, 2 x 2, 1*1, forming a fixed length data dimension.
And (3) transmitting the obtained feature map of the fixed dimension data to a seventh stage of the basic network model to obtain the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
Preferably, in step 1.1), in order to reduce the cost of model operation, the present invention sets the scaling factor μ=0.84, and stops scaling when the short-side pixels are less than 100 pix.
Preferably, the area overlap ratio I in step (Q3) has a threshold value of 0.7.
The invention has the advantages that:
the invention provides a method for positioning characteristics of a logistics vehicle based on an improved master R-CNN (factory R-CNN) aiming at the management problem of the logistics engineering vehicle and the problem that the traditional recognition method is difficult to recognize due to uncertainty factors such as environment, scene and appearance. Firstly, carrying out data enhancement on a logistics vehicle image to enable a sample image to increase scene diversity; then, constructing a basic network model by using the improved master R-CNN; then, a non-maximum suppression algorithm is introduced to screen a logistics vehicle target boundary box; and finally, unified normalization is carried out on the object characteristics of the logistics vehicle, so that accurate positioning is realized. Therefore, the characteristic positioning performance of the logistics vehicle is superior to that of the traditional vehicle detection method under the conditions of different environments, scenes and the like, the problem of logistics engineering vehicle management in a park can be well solved, and the logistics vehicle detection method has certain practical value and application prospect.
Drawings
FIG. 1 is a schematic view of a physical distribution vehicle image zoom according to the present invention;
FIG. 2 is a schematic diagram of the image rotation of a logistics vehicle in accordance with the present invention;
FIG. 3 is a diagram of the basic network architecture of the present invention;
FIG. 4 is a comparison of the logistic vehicle image before and after processing by using the non-maximum algorithm designed by the invention; wherein, fig. 4a is an image before non-maximum suppression, and fig. 4b is an image after non-maximum suppression;
FIG. 5 is a unified normalization flow chart for target features of the present invention;
fig. 6 is a technical roadmap of the invention.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
In order to overcome the defects in the prior art, the invention provides a method for positioning the characteristics of a logistics vehicle based on an improved master R-CNN, aiming at the management problem of the logistics engineering vehicle and the problem that the traditional recognition method is difficult to recognize due to uncertainty factors such as environment, scene, appearance and the like. Firstly, carrying out data enhancement on a logistics vehicle image to enable a sample image to increase scene diversity; then, constructing a basic network model by using the improved faster-CNN; then, a non-maximum suppression algorithm is introduced to screen a logistics vehicle target boundary box; and finally, unified normalization is carried out on the object characteristics of the logistics vehicle, so that accurate positioning is realized.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method for logistic vehicle feature localization based on improved faster-CNN, comprising the steps of:
step one, carrying out image enhancement processing on logistics vehicles;
for the problems of fixed shooting angle, single background, lower detection rate and the like of logistics vehicles, the invention introduces a data enhancement means, and the logistics vehicle images are processed through operations such as multi-scale equal-scale scaling, image rotation, enhanced saturation and the like, so that the scene diversity is increased, and the logistics vehicle images are used for further identification and positioning.
1.1 Multi-scale scaling operation is carried out on the logistics vehicle;
the object flow vehicle image is scaled in multiple scales according to the principle of not damaging the specific aspect ratio of the object flow vehicle in the original image, so that the positioning network can learn the object features with specific proportions.
Suppose that the pixel before scaling of a certain flow vehicle is marked as A 0 (x 0 ,y 0 ) The scaled seat is marked as A 1 (x 1 ,y 1 ) Then A 0 And A is a 1 The relation is satisfied:
(x 1 ,y 1 )=(μx 0 ,μy 0 ) (1)
where μ represents a scaling factor. The above equation corresponds to the image scaling matrix, expressed as the following matrix:
wherein when μ > 1, an image enlarging operation is represented; when μ < 1, an image reduction operation is indicated. In order to reduce the cost of model operation, the present invention sets the scaling factor μ=0.84, and stops scaling when the short-side pixels are less than 100 pix.
1.2 Rotating the logistics vehicle image;
when the camera shoots a logistics vehicle in quick running, the phenomenon that the angle difference of the captured images is extremely large is caused, and in order to adapt to the recognition and positioning of different angles, the captured logistics vehicle images are required to be subjected to rotary transformation, so that vehicle characteristic information of various angles is generated.
The invention sets the center of the logistics vehicle image as the rotation center O (0, 0), the anticlockwise rotation angle is recorded as theta, and when any pixel point P (x, y) in the image is changed into P after rotation conversion 1 (x 1 ,y 1 ) The rotation process is represented by the following equation:
the above formula is a polar transformation formula, which corresponds to the image rotation matrix and is expressed as the following matrix:
1.3 Performing saturation enhancement operation on the logistics vehicle image;
in order to increase the diversity of data samples and enable the feature positioning network to be suitable for complex illumination environments, the saturation of the logistics vehicle image is adjusted.
The specific flow of adjusting the image saturation is as follows:
(S1) calculating a pixel extremum on the logistics vehicle image;
where rgbMax represents a pixel maximum value and rgbMin represents a pixel minimum value.
(S2) saturation calculation;
the saturation S is calculated as follows:
delta=(rgbMax-rgbMin)/255 (6)
value=(rgbMax+rgbMin)/255 (7)
L=value/2 (8)
(S3) adjusting the logistics vehicle image saturation;
a saturation parameter beta is set for adjusting the illumination intensity, and the calculation flow is as follows:
1. if the parameter beta is more than or equal to 0, firstly, obtaining an intermediate variableIs the value of (1):
updatingIs the value of (1):
adjusting the saturation:
RGB'=RGB+(RGB-L*255)*α (12)
2. if the parameter β <0, then:
RGB'=L*255+(RGB-L*255)*(1+α) (13)
the logistics vehicle image subjected to scaling, rotation operation and saturation enhancement is applied to the following steps so as to accurately position logistics vehicle characteristics.
Step two, constructing a basic network model;
although three basic networks are provided by the faster-CNN for feature extraction, in order to obtain a better feature extraction effect, the VGGNet-16 basic network is adopted as the feature extraction network for classifying logistics vehicles of different vehicle types. Meanwhile, in order to realize the positioning of the logistics vehicles, the invention adds an object detection positioning model of the RPN network after the feature extraction module in the third convolution layering of the fifth convolution layer of VGGNet-16.
The detailed design flow of the basic network model constructed by the invention is as follows:
(T1) first stage: firstly, inputting an image with the size of W, H and 3 processed in the step one; then carrying out convolution operation on the logistics vehicle image through two continuous 64-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; the convolved image is then reduced in dimension by a 64-channel max pooling layer with a pooling kernel size of 2 x 2 and a step size of 2. At this stage output a sheetA feature map of size.
(T2) second stage: the flow is the same as the first stage, namely the image obtained in the first stage is input into a second stage network, and then a new characteristic diagram is obtained through convolution and pooling operation. But unlike the first stage, the convolution and pooling channels of the second stage are each changed to 128, and the other parameters are the same as the first stage.
(T3) third stage: firstly, inputting the image output by the second stage into a network of a third stage; then, carrying out convolution operation on the image through three continuous 256-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; the convolved image is then reduced in dimension by a 256-channel max pooling layer with a pooling kernel size of 2 x 2 and a step size of 2.
(T4) fourth stage: the flow is the same as the third stage, namely the image obtained in the third stage is input into a fourth stage network, and then a new characteristic diagram is obtained through convolution and pooling operation. But unlike the third stage, the convolution and pooling channels of the fourth stage are each changed to 512, and the other parameters are the same as the third stage.
(T5) fifth stage: this stage consists of three convolutional layers, each with 512 channels, with a convolution kernel size of 3*3 and a convolution step size of 2. The feature map output at this stage has a size of
(T6) sixth stage: firstly, connecting a convolution layer with a convolution kernel size of 3*3, a convolution step length of 2 and a convolution channel of 512; then connecting a classification loss function and a frame regression loss function, and carrying out regression judgment on frame information and classification information (the probability of most likely being a certain vehicle type) belonging to logistics vehicles or backgrounds;
(T7) seventh stage: firstly, connecting two full connection layers of 4096 channels; then connecting a total loss function; and finally outputting the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
In the above-mentioned basic network structure, the design of parameters such as some activation functions, loss functions and the like is designed, and will be described in detail below:
(P1) in VGGNet-16 basic network, regarding the activation function of the connection after all convolution layers, reLu activation functions are used in the invention:
ReLu(x)=max(0,x) (14)
(P2) in the sixth stage, the classification loss function and the rim regression loss function used are designed as follows:
classification loss L rpn_cls Expressed as:
wherein p is i Representing the probability that the frame is a logistics vehicle or a background;and (5) representing the probability that the real frame corresponding to the frame is a logistics vehicle or a background, if the real frame is a foreground, marking the real frame as 1, and otherwise marking the real frame as 0.
Boundary box regression loss L rpn_box Expressed as:
wherein t is i Four-dimensional position information representing predicted border i, denoted t i (x i ,y i ,w i ,h i );Four-dimensional position information representing a real frame, denoted +.> The function is expressed as follows:
(P3) in the seventh stage, the overall loss function design principle is: the frame regression loss adopts a logistic regression mapping method; the classification loss adopts a gradient descent method; in the process of decreasing the loss function, an Adam gradient decreasing method is used for optimizing the loss function, and corresponding parameters are set to alpha=0.001 and beta 1 =0.9、β 2 =0.999 and ε=10e-8.
(P4) during training, the learning rate adjustment strategy of the present invention employs a multi-stage decay method.
Thirdly, screening a logistics vehicle target by using a non-maximum suppression algorithm;
the number of boundary boxes on the same logistics vehicle obtained after the logistics vehicle image is processed by the basic network model in the second step is large, so that a method is required to be introduced to screen out redundant boundary boxes. The specific operation flow is as follows:
(Q1) four-dimensional position information (x) according to the prediction boundary box i ,y i ,w i ,h i ) The area S of all predicted frames of each vehicle in the logistics vehicle image can be obtained i
S i =w i *h i (18)
(Q2) in the sixth stage of the basic network model, determining the border information and the classification information belonging to the logistics vehicle or the background through regression. For each real logistics vehicle, the corresponding bounding boxes are more, the real logistics vehicles are ranked according to the probability from big to small, and one bounding box with the highest probability is screened out;
and (Q3) circularly calculating the area intersection ratio I of the screened bounding box and the rest bounding boxes, if the I is larger than a preset threshold value (the default threshold value is 0.7), determining that the bounding box is overlapped with the bounding box screened in the step (Q2) in a heavy manner, and deleting the bounding box until all the bounding boxes in the step (Q2) are processed.
If and only if the condition is satisfiedWhen the cross ratio I is calculated as follows:
if and only if the condition is satisfiedWhen the cross ratio I is calculated as follows:
the subscripts in the above formulae (19) - (20) and their constraints can be defined byAt the same time must->Taking the constraint in formula (19) as an example, it can be changed into the following form:
if the constraint in formula (19) is changed to the form of formula (21), the subscript in formula (19) must be changed accordingly.
When the constraint conditions in the formula (19) and the formula (20) are satisfied, the calculation formula of the cross ratio I is:
otherwise, i=0, meaning that the two bounding boxes do not intersect, both remain.
In the formulae (19) - (22), max represents the maximum bounding box, and its area is denoted as S max The method comprises the steps of carrying out a first treatment on the surface of the oti represents any of the remaining bounding boxes, the area of which is denoted S oti The method comprises the steps of carrying out a first treatment on the surface of the The area of intersection between two bounding boxes is denoted S ovp ;(x max ,y max ,w max ,h max ) Four-dimensional position information of the maximum screened boundary frame, namely center coordinates, frame width and frame height; (x) oti ,y oti ,w oti ,h oti ) Four-dimensional position information representing any of the remaining bounding boxes.
Step four, unified normalization is carried out on the object characteristics of the logistics vehicles;
in order to solve the problem of unmatched dimensions of subsequent connection layers caused by different edge characteristics of boundary frames after non-maximum suppression, the invention connects a region-of-interest pooling layer after a loss function in a sixth stage, and performs unified normalization on the boundary frames with different edge characteristics. The specific operation flow is as follows:
(M1) quantizing four-dimensional position information of a boundary box on the logistics vehicle image obtained in the step three into integer array coordinates;
(M2) dividing the quantized bounding box average into maximum pooling of 4*4, 2 x 2, 1*1, forming a fixed length data dimension.
And (3) transmitting the obtained feature map of the fixed dimension data to a seventh stage of the basic network model to obtain the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
The embodiments described in the present specification are merely examples of implementation forms of the inventive concept, and the scope of protection of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, and the scope of protection of the present invention and equivalent technical means that can be conceived by those skilled in the art based on the inventive concept.

Claims (3)

1. A method for logistic vehicle feature localization based on improved faster-CNN, comprising the steps of:
step one, carrying out image enhancement processing on logistics vehicles;
introducing a data enhancement means, and processing the logistics vehicle image through operations of multi-scale equal-scale scaling, image rotation and saturation enhancement, so as to increase scene diversity of the logistics vehicle image for further identification and positioning;
1.1 Multi-scale scaling operation is carried out on the logistics vehicle;
the method has the advantages that the principle that the specific length-width ratio of the logistics vehicle in the original image is not destroyed is adopted, the logistics vehicle image is scaled in multiple scales, and the positioning network can learn target features in specific proportions;
suppose that the pixel before scaling of a certain flow vehicle is marked as A 0 (x 0 ,y 0 ) The scaled seat is marked as A 1 (x 1 ,y 1 ) Then A 0 And A is a 1 The relation is satisfied:
(x 1 ,y 1 )=(μx 0 ,μy 0 ) (1)
wherein μ represents a scaling factor; the above equation corresponds to the image scaling matrix, expressed as the following matrix:
wherein when μ > 1, an image enlarging operation is represented; when μ < 1, an image reduction operation is represented;
1.2 Rotating the logistics vehicle image;
when a camera shoots a logistics vehicle in quick running, the phenomenon that the angle difference of the captured images is extremely large is caused, and in order to adapt to the identification and positioning of different angles, the captured logistics vehicle images are required to be subjected to rotary transformation so as to generate vehicle characteristic information of various angles;
setting the center of the logistics vehicle image as a rotation center O (0, 0), recording the anticlockwise rotation angle as theta, and changing any pixel point P (x, y) in the image into P after rotation conversion 1 (x 1 ,y 1 ) The rotation process is represented by the following equation:
the above formula is a polar transformation formula, which corresponds to the image rotation matrix and is expressed as the following matrix:
1.3 Performing saturation enhancement operation on the logistics vehicle image;
in order to increase the diversity of data samples, the characteristic positioning network can be suitable for a complex illumination environment, and the saturation of the logistics vehicle image is adjusted;
the specific flow of adjusting the image saturation is as follows:
(S1) calculating a pixel extremum on the logistics vehicle image;
wherein rgbMax represents a maximum value of the pixel, and rgbMin represents a minimum value of the pixel;
(S2) saturation calculation;
the saturation S is calculated as follows:
delta=(rgbMax-rgbMin)255 (6)
value=(rgbMax+rgbMin)/255 (7)
L=value/2 (8)
(S3) adjusting the logistics vehicle image saturation;
a saturation parameter beta is set for adjusting the illumination intensity, and the calculation flow is as follows:
1. if the parameter beta is more than or equal to 0, firstly, obtaining an intermediate variableIs the value of (1):
updatingIs the value of (1):
adjusting the saturation:
2. if the parameter β <0, then:
the logistics vehicle image subjected to scaling treatment, rotation operation and saturation enhancement is applied to the following steps so as to accurately position the logistics vehicle characteristics;
step two, constructing a basic network model;
the VGGNet-16 basic network is used as a feature extraction network for classifying logistics vehicles of different vehicle types; meanwhile, in order to realize the positioning of the logistics vehicles, a target detection positioning model of an RPN network is added behind a feature extraction module in a third convolution layering of a fifth convolution layer of VGGNet-16;
the steps of constructing the basic network model are as follows:
(T1) first stage: firstly, inputting an image with the size of W, H and 3 processed in the step one; then carrying out convolution operation on the logistics vehicle image through two continuous 64-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; then, the convolved image is subjected to dimension reduction through a maximum pooling layer of 64 channels, wherein the pooling kernel size is 2 x 2, and the step length is 2; at this stage output a sheetA feature map of size;
(T2) second stage: the flow is the same as the first stage, namely, the image obtained in the first stage is input into a second stage network, and then a new characteristic diagram is obtained through convolution and pooling operation; but the difference from the first stage is that the convolution and pooling channels of the second stage are both 128, and other parameters are the same as those of the first stage;
(T3) third stage: firstly, inputting the image output by the second stage into a network of a third stage; then, carrying out convolution operation on the image through three continuous 256-channel convolution layers, wherein the convolution kernel size is 3*3, and the convolution step length is 2; then, the convolved image is subjected to dimension reduction through a maximum pooling layer of 256 channels, wherein the pooling kernel size is 2 x 2, and the step length is 2;
(T4) fourth stage: the flow is the same as the third stage, namely, the image obtained in the third stage is input into a fourth stage network, and then a new feature map is obtained through convolution and pooling operation; but the difference from the third stage is that the convolution and pooling channels of the fourth stage are changed to 512, and other parameters are the same as those of the third stage;
(T5) fifth stage: the phase consists of three convolution layers, each convolution layer has 512 channels, the convolution kernel size is 3*3, and the convolution step length is 2; the feature map output at this stage has a size of(T6) sixth stage: firstly, connecting a convolution layer with a convolution kernel size of 3*3, a convolution step length of 2 and a convolution channel of 512; then connecting a classification loss function and a frame regression loss function, and carrying out regression judgment on frame information and classification information belonging to logistics vehicles or backgrounds, wherein the classification information is the probability that the logistics vehicles displayed by the images are most likely to be of a certain vehicle type;
(T7) seventh stage: firstly, connecting two full connection layers of 4096 channels; then connecting a total loss function; finally, outputting an accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type;
in the above basic network structure, the design of parameters related to the activation function and the loss function specifically includes:
(P1) in VGGNet-16 base network, the ReLu activation function is used for all the activation functions of the convolutional layer post-connection:
ReLu(x)=max(0,x) (14)
(P2) in the sixth stage, using the classification loss function and the bounding box regression loss function:
classification loss L rpn_cls Expressed as:
wherein p is i Representing the probability that the frame is a logistics vehicle or a background;representing the probability that the real frame corresponding to the frame is a logistics vehicle or a background, if the real frame is a foreground, marking the real frame as 1, otherwise marking the real frame as 0;
boundary box regression loss L rpn_box Expressed as:
wherein t is i Four-dimensional position information representing predicted border i, denoted t i (x i ,y i ,w i ,h i );Four-dimensional position information representing a real frame, denoted +.> The function is expressed as follows:
(P3) in the seventh stage, the overall loss function design principle is: the frame regression loss adopts a logistic regression mapping method; the classification loss adopts a gradient descent method; in the process of decreasing the loss function, an Adam gradient decreasing method is used for optimizing the loss function, and the corresponding parameter is set to alpha=0.001,β 1 =0.9、β 2 =0.999 and epsilon=10e-8;
(P4) in the training process, the learning rate adjustment strategy adopts a multi-stage attenuation method;
thirdly, screening a logistics vehicle target by using a non-maximum suppression algorithm;
the number of the boundary boxes on the same logistics vehicle obtained after the logistics vehicle image is processed by the basic network model in the second step is multiple, so that a method is required to be introduced to screen out redundant boundary boxes; the specific operation flow is as follows:
(Q1) four-dimensional position information (x) according to the prediction boundary box i ,y i ,w i ,h i ) The area S of all predicted frames of each vehicle in the logistics vehicle image can be obtained i
S i =w i *h i (18)
(Q2) in a sixth stage of the basic network model, determining frame information and classification information belonging to the logistics vehicle or the background through regression; for each real logistics vehicle, a plurality of corresponding bounding boxes are arranged, the bounding boxes are ordered according to the probability from big to small, and one bounding box with the highest probability is screened out;
(Q3) circularly calculating the area intersection ratio I of the screened bounding box and the rest bounding boxes, if the I is larger than a preset threshold value, determining that the bounding box is seriously overlapped with the bounding box screened in the step (Q2), and deleting the bounding box until all the bounding boxes in the step (Q2) are processed;
if and only if the condition is satisfied
When the cross ratio I is calculated as follows:
if and only if the condition is satisfied
When the cross ratio I is calculated as follows:
the subscripts in the above formulae (19) - (20) and their constraints can be defined byAt the same time must->Taking the constraint in formula (19) as an example, it becomes the following form:
if the constraint in formula (19) changes to the form of formula (21), then the subscript in formula (19) must also be changed accordingly;
when the constraint conditions in the formula (19) and the formula (20) are satisfied, the calculation formula of the cross ratio I is:
otherwise, i=0, meaning that the two bounding boxes do not intersect, both remain;
in the formulae (19) - (22), max represents the maximum bounding box, and its area is denoted as S max The method comprises the steps of carrying out a first treatment on the surface of the oti represents any of the remaining bounding boxes, the area of which is denoted S oti The method comprises the steps of carrying out a first treatment on the surface of the The area of intersection between two bounding boxes is denoted S ovp
(x max ,y max ,w max ,h max ) Representing the largest bounding box screenedFour-dimensional position information, namely center coordinates, frame width and frame height; (x) oti ,y oti ,w oti ,h oti ) Four-dimensional position information representing any one of the remaining bounding boxes;
step four, unified normalization is carried out on the object characteristics of the logistics vehicles;
in order to solve the problem of unmatched dimensions of subsequent connecting layers caused by different edge characteristics of the boundary frames after non-maximum suppression, after a loss function in a sixth stage, a region-of-interest pooling layer is connected, and unified normalization is carried out on the boundary frames with different edge characteristics; the specific operation flow is as follows:
(M1) quantizing four-dimensional position information of a boundary box on the logistics vehicle image obtained in the step three into integer array coordinates;
(M2) dividing the quantized bounding box into a maximum pooling of 4*4, 2 x 2, 1*1 on average, forming a fixed length data dimension;
and (3) transmitting the obtained feature map of the fixed dimension data to a seventh stage of the basic network model to obtain the accurate logistics vehicle positioning boundary frame and the probability of the corresponding vehicle type.
2. A method of improving the positioning of a feature of a faster R-CNN based logistics vehicle as set forth in claim 1, wherein: the threshold of the area intersection ratio I in the step (Q3) is 0.7.
3. A method of improving the positioning of a feature of a faster R-CNN based logistics vehicle as set forth in claim 1, wherein: the scaling factor μ=0.84 is set in step 1.1), and scaling is stopped when the short-side pixel is less than 100 pix.
CN202010690178.9A 2020-07-17 2020-07-17 Logistics vehicle feature positioning method based on improved master R-CNN Active CN111986080B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010690178.9A CN111986080B (en) 2020-07-17 2020-07-17 Logistics vehicle feature positioning method based on improved master R-CNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010690178.9A CN111986080B (en) 2020-07-17 2020-07-17 Logistics vehicle feature positioning method based on improved master R-CNN

Publications (2)

Publication Number Publication Date
CN111986080A CN111986080A (en) 2020-11-24
CN111986080B true CN111986080B (en) 2024-01-16

Family

ID=73438739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010690178.9A Active CN111986080B (en) 2020-07-17 2020-07-17 Logistics vehicle feature positioning method based on improved master R-CNN

Country Status (1)

Country Link
CN (1) CN111986080B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832794A (en) * 2017-11-09 2018-03-23 车智互联(北京)科技有限公司 A kind of convolutional neural networks generation method, the recognition methods of car system and computing device
CN109034024A (en) * 2018-07-16 2018-12-18 浙江工业大学 Logistics vehicles vehicle classification recognition methods based on image object detection
CN110175524A (en) * 2019-04-26 2019-08-27 南京航空航天大学 A kind of quick vehicle checking method of accurately taking photo by plane based on lightweight depth convolutional network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832794A (en) * 2017-11-09 2018-03-23 车智互联(北京)科技有限公司 A kind of convolutional neural networks generation method, the recognition methods of car system and computing device
CN109034024A (en) * 2018-07-16 2018-12-18 浙江工业大学 Logistics vehicles vehicle classification recognition methods based on image object detection
CN110175524A (en) * 2019-04-26 2019-08-27 南京航空航天大学 A kind of quick vehicle checking method of accurately taking photo by plane based on lightweight depth convolutional network

Also Published As

Publication number Publication date
CN111986080A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN109740465B (en) Lane line detection algorithm based on example segmentation neural network framework
CN109978807B (en) Shadow removing method based on generating type countermeasure network
CN106971185B (en) License plate positioning method and device based on full convolution network
CN113627228B (en) Lane line detection method based on key point regression and multi-scale feature fusion
CN110796009A (en) Method and system for detecting marine vessel based on multi-scale convolution neural network model
CN110866879B (en) Image rain removing method based on multi-density rain print perception
Tang et al. Single image dehazing via lightweight multi-scale networks
CN107564009B (en) Outdoor scene multi-target segmentation method based on deep convolutional neural network
CN110706239B (en) Scene segmentation method fusing full convolution neural network and improved ASPP module
CN111915530A (en) End-to-end-based haze concentration self-adaptive neural network image defogging method
CN111539343B (en) Black smoke vehicle detection method based on convolution attention network
CN109272060B (en) Method and system for target detection based on improved darknet neural network
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN107644221A (en) Convolutional neural networks traffic sign recognition method based on compression of parameters
CN111915525A (en) Low-illumination image enhancement method based on improved depth separable generation countermeasure network
CN113298810A (en) Trace detection method combining image enhancement and depth convolution neural network
CN114140672A (en) Target detection network system and method applied to multi-sensor data fusion in rainy and snowy weather scene
CN115661777A (en) Semantic-combined foggy road target detection algorithm
CN114627269A (en) Virtual reality security protection monitoring platform based on degree of depth learning target detection
CN112164010A (en) Multi-scale fusion convolution neural network image defogging method
CN116342536A (en) Aluminum strip surface defect detection method, system and equipment based on lightweight model
CN115019340A (en) Night pedestrian detection algorithm based on deep learning
CN114782919A (en) Road grid map construction method and system with real and simulation data enhanced
CN113902965A (en) Multi-spectral pedestrian detection method based on multi-layer feature fusion
CN110033045A (en) A kind of method and apparatus of trained identification image atomization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant