CN111062384A - Vehicle window accurate positioning method based on deep learning - Google Patents

Vehicle window accurate positioning method based on deep learning Download PDF

Info

Publication number
CN111062384A
CN111062384A CN201911089593.2A CN201911089593A CN111062384A CN 111062384 A CN111062384 A CN 111062384A CN 201911089593 A CN201911089593 A CN 201911089593A CN 111062384 A CN111062384 A CN 111062384A
Authority
CN
China
Prior art keywords
window
picture
stage
convolution
box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911089593.2A
Other languages
Chinese (zh)
Other versions
CN111062384B (en
Inventor
韩梦江
楼燚航
白燕
张永祥
陈杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boyun Vision Beijing Technology Co ltd
Original Assignee
Boyun Vision Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boyun Vision Beijing Technology Co ltd filed Critical Boyun Vision Beijing Technology Co ltd
Priority to CN201911089593.2A priority Critical patent/CN111062384B/en
Publication of CN111062384A publication Critical patent/CN111062384A/en
Application granted granted Critical
Publication of CN111062384B publication Critical patent/CN111062384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a car window accurate positioning method based on deep learning, which comprises the following steps: s1, acquiring a rough positioning frame of the vehicle window in the first stage; s11, selecting a sample group, and calibrating the corner point coordinates of the car window in the picture; s12, storing the picture and the coordinates of the corner points as a data set; s13, inputting the data set into the deep convolution network of the first stage to extract a feature map; s14, inputting the characteristic diagram into the BOX regression layer to obtain an approximate positioning frame of the car window; s2, acquiring four accurate corner point coordinates of the vehicle window in the second stage; s21, expanding the approximate positioning frame of the car window; s22, intercepting the picture in the expanded candidate frame; s23, converting the coordinates of the corner points into relative coordinates relative to the expanded candidate frame; s24, inputting the intercepted picture into a depth convolution network at the second stage to extract a feature map, and converting the feature map into a feature vector; and S25, inputting the feature vectors into the linear regression layer to obtain the accurate corner point coordinates of the car window.

Description

Vehicle window accurate positioning method based on deep learning
Technical Field
The invention relates to the field of image processing, in particular to a car window accurate positioning method based on deep learning.
Background
In recent years, intelligent traffic systems and intelligent monitoring are rapidly developed, and vehicle window identification plays a significant role in the fields of intelligent traffic and intelligent monitoring. Electronic police and vehicle checkpoints can acquire a large number of high-definition vehicle pictures in real time, and the pictures are effectively applied to acquire more information as much as possible to help relieve traffic management pressure, which is the focus of attention in the field of intelligent traffic and intelligent monitoring at present. The vehicle window identification provides possibility for further analyzing driver information, positioning safety belts and improving the accuracy of vehicle type identification in the fields of intelligent transportation and intelligent monitoring. In addition, if the vehicle window can be accurately positioned, more interference can be eliminated, and further more accurate vehicle interior information can be obtained.
The goal of window positioning is to automatically identify the vehicle windows for a given series of vehicle pictures from different cameras, with different colors, orientations, types, and sizes.
At present, aiming at the problem of vehicle window positioning, generally, a vehicle window is detected by using some effective information of characteristics such as vehicle color, texture, spatial relationship and the like, and the traditional methods comprise the following methods: one is to separately process dark color cars and light color cars with complex backgrounds under different illumination conditions, and to segment and position car windows by adopting a genetic algorithm constructed based on a chromaticity function curve, the method also has the defects of long positioning time, complicated process and large resource consumption; the other method is that the texture information of the vehicle in the picture is utilized, the vehicle window can be roughly positioned through texture detection after the vehicle is subjected to color space conversion, the disadvantage of this is that the excessive dependence on the color texture information of the vehicle deteriorates the robustness of the algorithm, so that the performance of the detection algorithm is greatly reduced under different illumination and vehicle color information; yet another method is to use a sliding window to position the window with reference to the positioned window position, the accuracy and precision of the positioning method for positioning the position of the vehicle window are difficult to meet the use requirements of people.
Disclosure of Invention
The invention aims to solve the problems and provides a depth learning-based accurate window positioning method which can accurately position a vehicle window and output coordinates of four corner points of the window.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a car window accurate positioning method based on deep learning; the method comprises the following steps:
s1, acquiring a rough positioning frame of the front window of the vehicle in the first stage;
s11, selecting a vehicle picture as a sample group, and manually calibrating coordinates of four corner points of an upper left window, an upper right window, a lower left window and a lower right window in the vehicle picture;
s12, storing each vehicle picture and the front window corner point coordinate picture in the vehicle picture correspondingly to form a data set;
s13, inputting the data set into a deep convolution network of a first stage, wherein the deep convolution network of the first stage is a neural network with 23 layers, performing five times of convolution operation on pictures in the data set, performing batch regularization on output characteristic graphs after each time of convolution operation, inputting the regularization into an activation function, performing maximum pooling operation after the first four times of convolution operation, and adding one branch to the deep convolution network after the five times of convolution operation; in the two branches, one branch continues to perform five times of convolution operation and one time of full convolution operation, the other branch fuses the feature graph before the added branch and the feature graph obtained after the former branch performs five times of convolution operation in the channel direction, and finally the two branches respectively perform one time of convolution operation and one time of full convolution operation to obtain a fused vehicle picture feature graph;
s14, inputting the vehicle picture characteristic diagram and the corresponding corner point coordinates of the front window into a BOX regression layer, and regressing to obtain an approximate positioning frame of the front window after optimizing a loss function;
s2, acquiring four accurate corner point coordinates of the front window of the vehicle in the second stage;
s21, enlarging the front window approximate positioning frame obtained in the first stage by 1.3 times in the width and height directions to obtain an enlarged candidate frame;
s22, capturing pictures in the expansion candidate frame from the vehicle picture to form a new picture;
s23, converting coordinates of four corner points of the manually calibrated front window into relative coordinates relative to the expansion candidate frame;
s24, inputting the new picture obtained after the interception into a deep convolutional network at the second stage, extracting a feature map of the new picture through the deep convolutional network at the second stage, and converting the feature map into a feature vector through a full-link layer of the deep convolutional network at the second stage;
and S25, inputting the feature vectors and the transformed relative coordinates into a linear regression layer, and optimizing a loss function and then performing regression to obtain four accurate corner point coordinates of the front window.
Further, the loss function of the BOX regression layer in the step S14 adopts smooth L1 losss; the calculation formula is as follows:
Figure 320706DEST_PATH_IMAGE001
Figure 175529DEST_PATH_IMAGE002
wherein ,
Figure 611059DEST_PATH_IMAGE003
when the value of the parameter is 1, the parameter represents that the ith default box is matched with the jth grountruth box; n is the number of candidate frames; m is a position parameter of the boundary frame, cx represents an x coordinate of the center point of the boundary frame, cy represents a y coordinate of the center point of the boundary frame, w represents the width of the boundary frame, and h represents the height of the boundary frame; l is the predicted value of the position of the bounding box corresponding to the default box,
Figure 740689DEST_PATH_IMAGE004
is the corresponding ground trouh box location parameter value.
Further, in step S24, five times of convolution operations are performed on the new image obtained after the truncation, a linear rectification function with parameters is used as the activation function for the activation function after the convolution, the maximum pooling operation is performed after the first four times of convolution operations and then one pooling layer, a full connection layer is accessed after the five times of convolution operations, and the extracted feature maps are integrated into one feature vector.
Further, the loss function of the linear regression layer in step S25 adopts an L2 norm loss function, and the calculation formula is as follows:
Figure 86219DEST_PATH_IMAGE005
wherein theta is the weight of the deep convolutional network at the second stage, i is the sample of each batch, j is the 4 corner point labels of the front window in each vehicle picture,
Figure 315206DEST_PATH_IMAGE006
is the x coordinate of the corner point of the front window to be regressed,
Figure 378977DEST_PATH_IMAGE007
is the x coordinate of the front window corner point of the ground truth, w is the width of the new picture obtained after the interception,
Figure 233670DEST_PATH_IMAGE008
is the y coordinate of the corner point of the front window to be regressed,
Figure 433707DEST_PATH_IMAGE009
the coordinate of the front window corner point y of the ground truth is shown, and h is the height of the new picture obtained after the clipping.
Compared with the prior art, the invention has the advantages and positive effects that:
the invention provides a detection method for regression of a rough positioning frame of a vehicle window and further regression of precise corner coordinates of the vehicle window by utilizing a deep convolutional network, which is carried out in two stages, wherein in the first stage, a 23-layer neural network is used for extracting multi-level and multi-scale characteristics of a vehicle picture, the extracted characteristics are applied to a BOX regression algorithm to obtain the rough positioning frame of the vehicle window, and in the second stage, a 6-layer convolutional neural network is used for carrying out linear regression on four corner coordinates of the vehicle window in the rough positioning frame of the vehicle window, so that the vehicle window can be precisely positioned.
According to the invention, the accurate coordinates of the corner points of the car window are obtained by stages by utilizing two neural networks, so that the positioning precision and accuracy of the car window are greatly improved; the calculation speed of the neural network is effectively improved by the design of combining the improved small-sized deep convolutional network with the regression algorithm; on the other hand, the vehicle window is positioned by solving the coordinates of the four corner points of the vehicle window, so that the position information of the trapezoidal vehicle window can be accurately obtained, a large amount of edge interference generated by the rectangular positioning frame is eliminated, and the vehicle internal information can be more accurately obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a diagram of a training model framework at a first stage;
FIG. 2 is a diagram of a first stage deep convolutional network architecture;
FIG. 3 is a characteristic diagram of a BOX regression layer;
FIG. 4 is a model framework diagram of a BOX regression layer;
FIG. 5 is a diagram of a training model framework for the second phase.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments of the present invention by a person skilled in the art without any creative effort, should be included in the protection scope of the present invention.
As shown in fig. 1 to 5, the present invention provides a method for detecting regression of a rough positioning frame of a vehicle window and further regression of precise corner coordinates of the vehicle window by using a deep convolutional network, the method is performed in two stages, in the first stage, a 23-layer neural network is used to extract multi-level and multi-scale features of a vehicle picture, the extracted features are applied to a BOX regression algorithm to obtain the rough positioning frame of the vehicle window, and in the second stage, a 6-layer convolutional neural network is used to perform linear regression on four corner coordinates of the vehicle window in the rough positioning frame of the vehicle window again, so that the vehicle window can be precisely positioned.
The training model framework of the invention at the stage of obtaining the approximate positioning frame of the car window is shown in figure 1.
In the first stage, two data sets are input in batches to train the network model when the model is trained, one data set is a picture set and comprises vehicle pictures with different colors, directions, types and sizes in a real monitoring camera scene, and the other data set is coordinates of a vehicle window ground route marking frame corresponding to each vehicle picture. And then, extracting multi-level and multi-scale features of the image from the input image data set through a deep convolution network, inputting the features into a BOX regression layer after extracting the features of different scales and levels of the vehicle image, and regressing approximate coordinates of the vehicle window positioning frame by matching a default BOX and a ground treth BOX of the labeling frame and then utilizing the matched default BOX.
The deep convolutional network at this stage is different from a common classification neural network, and the deep convolutional network at this stage is a network structure obtained by modifying on the basis of YOLOV3-TINY, and the network structure is shown in fig. 2.
As shown in fig. 2, firstly, five convolution operations are performed on an input picture, an output feature map is regularized in batches after each convolution, then the regularized output feature map is input into an activation function, and after the first four convolution operations, a maximum pooling operation is followed, so that the height and width of the feature map output after each convolution is halved; after five times of convolution operation, the network is added with one branch in the middle, and different times of convolution operation are carried out on each branch. The reason why the network structure is divided into two branches here is to enable the network to extract feature information of different scales. And one branch of the two branch networks continuously performs five times of convolution operation and one time of full convolution operation, the other branch fuses the feature diagram before the branch and the feature diagram obtained after five times of convolution operation of the previous branch in the channel direction, the previous branch performs up-sampling to ensure that the feature diagrams of the two branches are consistent in height and width before fusion, and then the two branches respectively perform one time of convolution operation and full convolution operation and input the operation to the BOX regression layer.
As shown in fig. 3, default boxes (shown by blue and red dotted lines in the figure) with different length-width ratios are preset on feature maps with different scales on a BOX regression layer, then the default boxes and a ground trout BOX are matched according to an IOU, then position regression is performed on the matched default boxes, and an optimal approximate positioning frame of a vehicle window is selected and obtained in a non-maximum suppression mode. In practice, our network generates two feature maps of 10 × 10 and 20 × 20, and sliding windows are made on the two feature maps to match default box and ground truth box.
As shown in FIG. 4, for default boxes with different length-width ratios, we match these default boxes with the input group try box by comparing Best result Best Jaccard overlay calculated by the default boxes and the group try box with respect to the Jaccard coefficients, if the calculation result between them is greater than the threshold value preset by us, we consider the matching successful, and then add this default box into the list to be regressed. The Jaccard coefficient is calculated as follows:
Figure 161492DEST_PATH_IMAGE010
wherein A represents the area covered by the default box, and B represents the area covered by the ground channel box.
The candidate box on the match is then subjected to position regression to make it closer to the ground truth box, and smooth L1 losss is selected as the loss function for performing regression. The specific formula of the loss function is as follows:
Figure 322346DEST_PATH_IMAGE001
Figure 856095DEST_PATH_IMAGE002
wherein ,
Figure 35273DEST_PATH_IMAGE003
is an indication parameter, when its value is 1, it represents that the ith default box matches with the jth group channel box; n represents the number of candidate frames; m represents a position parameter of the bounding box, wherein cx represents an x coordinate of the center of the bounding box, cy represents a y coordinate of the center point of the bounding box, w represents the width of the bounding box, and h represents the height of the bounding box; l represents the predicted value parameter value of the position of the bounding box corresponding to the default box;
Figure 996275DEST_PATH_IMAGE004
is the corresponding ground truth box location parameter value.
In the stage, the loss function is calculated when the network forwards propagates, the network weight is updated according to the derivative of the sample when the network backwards propagates, and the loss function is continuously optimized, so that the network can regress the input picture out of an approximate positioning frame of the car window.
The training model frame in the accurate positioning stage of the coordinates of the four corner points of the car window is shown in figure 5.
And in the second stage, two data sets are input in batches to train the network model, one is the car window picture generated in the previous stage, the positioning frame is enlarged by 1.3 times, the car window picture is captured from the original picture as the input of the second stage, and the other is the group route coordinates of 4 car window corner points corresponding to each picture. Similarly, multi-level features of the image are extracted from an input image data set through a deep convolution network, after the features of the image level of the car window are extracted, the features are input into a linear regression layer of angular point coordinates, and model parameters are trained by continuously reducing Euclidean distances between the angular point coordinates to be regressed and real angular point coordinates.
The deep convolution network at the stage adopts a network structure obtained by modifying on the basis of oNet, the network firstly carries out five times of convolution operation on an input original picture, an activation function after each time of convolution adopts a linear rectification function (Prelu) with parameters, a pooling layer is connected after each time of previous four times of convolution operation, the maximum pooling operation is adopted, and the height and width of an output characteristic diagram are reduced by half each time. And finally, accessing a full-connection layer at the end of the convolution layer, integrating the characteristic diagram into a vector, inputting the integrated vector into the angular point coordinate linear regression layer, and regressing four accurate angular point coordinates of the car window in the layer.
In the linear regression layer, an L2 norm loss function is adopted, and particularly, in order to obtain more accurate coordinates, the x coordinate and the y coordinate of the to-be-regressed corner point coordinate and the real corner point coordinate are divided by the width and the height of an input picture respectively, so that integer pixel coordinates are converted into floating point numbers, and the effect of obtaining more accurate corner point coordinates in the iteration process of reducing the loss function is achieved. The loss function is specifically formulated as follows:
Figure 972322DEST_PATH_IMAGE005
wherein theta is the weight of the deep convolutional network, i is the sample of each batch, j is the 4 car corner point labels of each picture,
Figure 653970DEST_PATH_IMAGE006
is the coordinate of the corner point x of the car window to be regressed,
Figure 94178DEST_PATH_IMAGE007
the window corner point x coordinate of the ground truth is, and w is an input graphThe width of the sheet is such that,
Figure 85137DEST_PATH_IMAGE008
is the y coordinate of the corner point of the car window to be regressed,
Figure 548479DEST_PATH_IMAGE009
the window corner point y coordinate of the ground truth is shown, and h is the height of the input picture.
The vehicle window corner point coordinates are estimated when the network is transmitted forwards, the loss function is calculated, the gradient of the loss function is calculated when the network is transmitted backwards, the network weight is updated continuously, the loss function is made to be smaller continuously, the vehicle window corner point coordinates estimated by the vehicle window corner point coordinates are made to be close to the real corner point coordinates continuously, and the accurate vehicle window corner point coordinates are obtained.
In a first stage, a method for obtaining a rough regression frame of a vehicle window is mainly provided, and the method specifically comprises the following steps:
(1) selecting a sample group, and manually calibrating a front vehicle window;
(2) correspondingly storing the data picture and the car window corner point coordinate picture to form a data set;
(3) dividing a data set into a training set and a test set;
(4) extracting multilevel and multiscale characteristics of the vehicle picture by using the deep convolutional network designed at the stage;
(5) inputting a feature map obtained by the deep convolutional network into a BOX regression layer to regress out of a vehicle window approximate positioning frame;
in the second stage, the invention provides a method for acquiring accurate coordinates of four corner points of a vehicle window, which specifically comprises the following steps:
(6) enlarging the vehicle window approximate positioning frame obtained in the first stage by 1.3 times in the width and height directions to obtain a candidate frame;
(7) intercepting the candidate frame from the original picture to form a new picture as the input of the stage;
(8) the coordinates of four corner points of the manually calibrated front vehicle window are converted into relative coordinates relative to the candidate frame;
(9) extracting the characteristics of an input picture by using the deep convolutional network designed at the stage, and converting the characteristic picture into a 256 characteristic vector through a full connection layer Fc;
(10) inputting the feature vectors obtained by the deep convolutional network into a car window corner coordinate linear regression layer, and accessing a designed L2 paradigm loss function as an optimization target; in the actual test stage, the eight values of the coordinates of the angular points of the car window 4 can be output only by extracting the characteristic vectors of the input pictures according to the steps and performing linear regression, so that the accurate positioning of the car window is obtained.
According to the invention, the accurate coordinates of the corner points of the car window are obtained by stages by utilizing two neural networks, so that the positioning precision and accuracy of the car window are greatly improved; the calculation speed of the neural network is effectively improved by the design of combining the improved small-sized deep convolutional network with the regression algorithm; on the other hand, the vehicle window is positioned by solving the coordinates of the four corner points of the vehicle window, so that the position information of the trapezoidal vehicle window can be accurately obtained, a large amount of edge interference generated by the rectangular positioning frame is eliminated, and the vehicle internal information can be more accurately obtained.

Claims (4)

1. A car window accurate positioning method based on deep learning; the method is characterized in that: the method comprises the following steps:
s1, acquiring a rough positioning frame of the front window of the vehicle in the first stage;
s11, selecting a vehicle picture as a sample group, and manually calibrating coordinates of four corner points of an upper left window, an upper right window, a lower left window and a lower right window in the vehicle picture;
s12, storing each vehicle picture and the front window corner point coordinate picture in the vehicle picture correspondingly to form a data set;
s13, inputting the data set into a Deep Convolutional Network (Deep Convolutional Network) of a first stage, wherein the Deep Convolutional Network of the first stage is a 23-layer neural Network, performing five times of convolution (convolution, herein abbreviated as Conv) operation on a picture in an Image data set (Image set), performing Batch regularization (Batch norm) on an output feature graph after each time of convolution operation, inputting the Batch regularization to an activation function Relu, performing maximum pooling (Maxpool) operation after the previous four times of convolution Conv1 operation, and adding one branch to the Deep Convolutional Network after the five times of convolution Conv5 operation; in two branches, one branch continues to perform five convolution Conv6 operations and one full convolution Conv12 operation, the other branch performs convolution Conv13 operations on the feature map before adding the branch and the feature map obtained by performing upsampling (upsample) after the previous branch performs Conv12 operation in the channel direction, and the last two branches perform convolution operation Conv11 and one full convolution operation Conv15 respectively to obtain a fused vehicle picture feature map;
s14, inputting a vehicle picture feature map (feature map) and corresponding corner coordinates of the front window into a frame regression layer (BOX regression layer for short) and regressing an approximate positioning frame of the front window after optimizing a loss function;
s2, acquiring four accurate corner point coordinates of the front window of the vehicle in the second stage;
s21, enlarging the front window approximate positioning frame obtained in the first stage by 1.3 times in the width and height directions to obtain an enlarged candidate frame (default box);
s22, capturing pictures in the expansion candidate frame from the vehicle picture to form a new picture;
s23, transforming the coordinates of four corner points of the front window of the artificial calibration data set (Annotation set) into relative coordinates relative to the expansion candidate frame;
s24, inputting the new picture obtained after the interception into a deep convolutional network at the second stage, extracting a feature map of the new picture through the deep convolutional network at the second stage, and converting the feature map into a feature vector through a full connection layer (FC for short) of the deep convolutional network at the second stage;
and S25, inputting the feature vectors and the transformed relative coordinates into a linear regression layer, and optimizing a loss function and then performing regression to obtain four accurate corner point coordinates of the front window.
2. The accurate car window positioning method based on deep learning of claim 1, wherein: in the step S14, a loss function of the BOX regression layer is smooth L1 loss; the calculation formula is as follows:
Figure 964607DEST_PATH_IMAGE001
Figure 131277DEST_PATH_IMAGE002
wherein ,
Figure 867152DEST_PATH_IMAGE003
to indicate the parameter, when the value is 1, it represents that the ith default box matches with the jth label box (ground route box); n is the number of candidate frames; m is a position parameter of the boundary frame, cx represents an x coordinate of the center point of the boundary frame, cy represents a y coordinate of the center point of the boundary frame, w represents the width of the boundary frame, and h represents the height of the boundary frame; l is the predicted value of the position of the bounding box corresponding to the default box,
Figure 659527DEST_PATH_IMAGE004
is the corresponding ground trouh box location parameter value.
3. The accurate car window positioning method based on deep learning as claimed in claim 2, characterized in that: in step S24, five times of convolution operations are performed on the new image obtained after the truncation, the linear rectification function with parameters is used as the activation function for the activation function after the convolution, the maximum pooling operation is performed after the previous four times of convolution operations and one pooling layer is connected after the five times of convolution operations, and the extracted feature maps are integrated into one feature vector.
4. The accurate car window positioning method based on deep learning of claim 3, wherein: the loss function of the linear regression layer in the step S25 adopts an L2 norm loss function, and the calculation formula is as follows:
Figure 797248DEST_PATH_IMAGE005
wherein theta is the weight of the deep convolutional network at the second stage, i is the sample of each batch, j is the 4 corner point labels of the front window in each vehicle picture,
Figure 134819DEST_PATH_IMAGE006
is the x coordinate of the corner point of the front window to be regressed,
Figure 357990DEST_PATH_IMAGE007
is the coordinate of the front window corner point x of the ground truth box, w is the width of the new picture obtained after the interception,
Figure 688477DEST_PATH_IMAGE008
is the y coordinate of the corner point of the front window to be regressed,
Figure 680704DEST_PATH_IMAGE009
the coordinate of the front window corner point y of the ground truth box is shown, and h is the height of the new picture obtained after the clipping.
CN201911089593.2A 2019-11-08 2019-11-08 Vehicle window accurate positioning method based on deep learning Active CN111062384B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911089593.2A CN111062384B (en) 2019-11-08 2019-11-08 Vehicle window accurate positioning method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911089593.2A CN111062384B (en) 2019-11-08 2019-11-08 Vehicle window accurate positioning method based on deep learning

Publications (2)

Publication Number Publication Date
CN111062384A true CN111062384A (en) 2020-04-24
CN111062384B CN111062384B (en) 2023-09-08

Family

ID=70298546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911089593.2A Active CN111062384B (en) 2019-11-08 2019-11-08 Vehicle window accurate positioning method based on deep learning

Country Status (1)

Country Link
CN (1) CN111062384B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914612A (en) * 2020-05-21 2020-11-10 淮阴工学院 Construction graph primitive self-adaptive identification method based on improved convolutional neural network
CN112270278A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Key point-based blue top house detection method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798335A (en) * 2017-08-28 2018-03-13 浙江工业大学 A kind of automobile logo identification method for merging sliding window and Faster R CNN convolutional neural networks
CN108256464A (en) * 2018-01-12 2018-07-06 适普远景遥感信息技术(北京)有限公司 High-resolution remote sensing image urban road extracting method based on deep learning
CN108428248A (en) * 2018-03-14 2018-08-21 苏州科达科技股份有限公司 Vehicle window localization method, system, equipment and storage medium
CN108764244A (en) * 2018-04-02 2018-11-06 华南理工大学 Potential target method for detecting area based on convolutional neural networks and condition random field
CN109740405A (en) * 2018-07-06 2019-05-10 博云视觉(北京)科技有限公司 A kind of non-alignment similar vehicle front window different information detection method
CN109902677A (en) * 2019-01-30 2019-06-18 深圳北斗通信科技有限公司 A kind of vehicle checking method based on deep learning
CN110096962A (en) * 2019-04-04 2019-08-06 苏州千视通视觉科技股份有限公司 Vehicle Detail based on region convolutional network identifies secondary structure method and device
CN110322522A (en) * 2019-07-11 2019-10-11 山东领能电子科技有限公司 A kind of vehicle color identification method based on the interception of target identification region
US20210142097A1 (en) * 2017-06-16 2021-05-13 Markable, Inc. Image processing system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210142097A1 (en) * 2017-06-16 2021-05-13 Markable, Inc. Image processing system
CN107798335A (en) * 2017-08-28 2018-03-13 浙江工业大学 A kind of automobile logo identification method for merging sliding window and Faster R CNN convolutional neural networks
CN108256464A (en) * 2018-01-12 2018-07-06 适普远景遥感信息技术(北京)有限公司 High-resolution remote sensing image urban road extracting method based on deep learning
CN108428248A (en) * 2018-03-14 2018-08-21 苏州科达科技股份有限公司 Vehicle window localization method, system, equipment and storage medium
CN108764244A (en) * 2018-04-02 2018-11-06 华南理工大学 Potential target method for detecting area based on convolutional neural networks and condition random field
CN109740405A (en) * 2018-07-06 2019-05-10 博云视觉(北京)科技有限公司 A kind of non-alignment similar vehicle front window different information detection method
CN109902677A (en) * 2019-01-30 2019-06-18 深圳北斗通信科技有限公司 A kind of vehicle checking method based on deep learning
CN110096962A (en) * 2019-04-04 2019-08-06 苏州千视通视觉科技股份有限公司 Vehicle Detail based on region convolutional network identifies secondary structure method and device
CN110322522A (en) * 2019-07-11 2019-10-11 山东领能电子科技有限公司 A kind of vehicle color identification method based on the interception of target identification region

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曲宝珠;曹国;刘宇;周丽存;: "基于多特征集成的卡口图像前车窗的定位算法" *
曲宝珠;曹国;刘宇;周丽存;: "基于多特征集成的卡口图像前车窗的定位算法", 信息技术, no. 12 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914612A (en) * 2020-05-21 2020-11-10 淮阴工学院 Construction graph primitive self-adaptive identification method based on improved convolutional neural network
CN111914612B (en) * 2020-05-21 2024-03-01 淮阴工学院 Construction graphic primitive self-adaptive identification method based on improved convolutional neural network
CN112270278A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Key point-based blue top house detection method

Also Published As

Publication number Publication date
CN111062384B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN110276767A (en) Image processing method and device, electronic equipment, computer readable storage medium
CN111462128B (en) Pixel-level image segmentation system and method based on multi-mode spectrum image
CN108154149B (en) License plate recognition method based on deep learning network sharing
CN116681636B (en) Light infrared and visible light image fusion method based on convolutional neural network
CN111461036B (en) Real-time pedestrian detection method using background modeling to enhance data
CN117253154B (en) Container weak and small serial number target detection and identification method based on deep learning
CN112819858B (en) Target tracking method, device, equipment and storage medium based on video enhancement
CN116309607B (en) Ship type intelligent water rescue platform based on machine vision
CN107808140B (en) Monocular vision road recognition algorithm based on image fusion
CN111768415A (en) Image instance segmentation method without quantization pooling
CN112561899A (en) Electric power inspection image identification method
CN113159043A (en) Feature point matching method and system based on semantic information
CN114140672A (en) Target detection network system and method applied to multi-sensor data fusion in rainy and snowy weather scene
Zhou et al. Adapting semantic segmentation models for changes in illumination and camera perspective
CN116757988B (en) Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks
CN111062384A (en) Vehicle window accurate positioning method based on deep learning
CN111488766A (en) Target detection method and device
CN117409244A (en) SCKConv multi-scale feature fusion enhanced low-illumination small target detection method
CN111832508B (en) DIE _ GA-based low-illumination target detection method
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
CN117372829A (en) Marine vessel target identification method, device, electronic equipment and readable medium
CN116721398A (en) Yolov5 target detection method based on cross-stage route attention module and residual information fusion module
CN111738964A (en) Image data enhancement method based on modeling
CN113537397B (en) Target detection and image definition joint learning method based on multi-scale feature fusion
CN112446292B (en) 2D image salient object detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant