CN116402853A - YOLOV 5-based target object following method and device - Google Patents

YOLOV 5-based target object following method and device Download PDF

Info

Publication number
CN116402853A
CN116402853A CN202310327802.2A CN202310327802A CN116402853A CN 116402853 A CN116402853 A CN 116402853A CN 202310327802 A CN202310327802 A CN 202310327802A CN 116402853 A CN116402853 A CN 116402853A
Authority
CN
China
Prior art keywords
feature map
following
model
preset
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310327802.2A
Other languages
Chinese (zh)
Inventor
张泽东
庞艳军
李鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Topkrypton Technology Co ltd
Original Assignee
Suzhou Topkrypton Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Topkrypton Technology Co ltd filed Critical Suzhou Topkrypton Technology Co ltd
Priority to CN202310327802.2A priority Critical patent/CN116402853A/en
Publication of CN116402853A publication Critical patent/CN116402853A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The embodiment of the application provides a training method of a target detection model, a target object following method, a device, computer equipment and a storage medium based on YOLOV5, and solves the problems of false detection, missed detection, poor detection and the like of small-scale pedestrians in target vehicle pedestrian tracking. By adopting the Ghost-BottleNeck module, the network model reduces the calculation cost while keeping good tracking performance, optimizes the positioning of vehicle-pedestrian tracking by using Alpha-IoU as a boundary loss function, improves the robustness of vehicle-pedestrian tracking, and improves the vehicle-pedestrian tracking performance under complex traffic road scenes by applying a slice auxiliary super reasoning strategy.

Description

YOLOV 5-based target object following method and device
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a target object following method and device based on YOLOV 5.
Background
The object tracking technology plays a vital role in computer vision because it has a variety of uses in applications such as security systems, pedestrian re-recognition, pedestrian tracking, and pedestrian intent prediction. With the development of intelligent automobiles, pedestrian detection has become a key technology for target detection. In addition, the rapid and accurate pedestrian detection method has important significance for safety of intelligent vehicles on roads and safety protection of pedestrians. In intelligent vehicles, target detection generally employs lighter weight tracking methods due to limitations in the computing power of the vehicle computing devices.
In a complex traffic road scene, the lightweight target tracking method obtains good confidence scores for large and medium-scale pedestrians. It cannot track small pedestrians at a far distance and has missed and erroneous tracks.
On the one hand, aiming at the problems of improving the small target tracking effect and robustness in complex scenes, effective methods are proposed at present. For example, an Extended Feature Pyramid Network (EFPN) is provided on the basis of a Feature Pyramid Network (FPN), and the Extended Feature Pyramid Network (EFPN) utilizes super-resolution (SR) features as a new feature transmission module in the FPN, so that detail features in an area are enriched, and small and medium targets can be tracked conveniently. In addition, DNN (deep neural network) weights of the network layer can be divided into a plurality of equal-sized blocks, and weights within the blocks can be trimmed to the same shape. Or, a mobile GPU-CPU cooperative scheme can be adopted, so that the detection method deployed on the mobile device can keep good detection precision and achieve high-efficiency reasoning speed. Or integrating the improved Spatial Pyramid Pooling (SPP) layer into the transverse connection of the FPN to better extract fine-grained information from the shallow feature map and improve the target detection accuracy of the unmanned aerial vehicle. Or a global-local feature enhancement network (GLF-Net) is adopted, and in the feature extraction process, a Local Feature Extraction (LFE) module and a Global Feature Extraction (GFE) module respectively extract local features and global features of an image, so that stable feature extraction under a complex background and a dense scene is realized, and a feature fusion module is responsible for fusing the global features and the local features, so that the feature representation capability of a network model is enhanced, and the detection precision of a multi-scale target is improved.
On the other hand, in order to balance the problem of small object detection performance and efficiency, an object detector is currently proposed that can infer feature pyramid based more quickly. The method guides the high-resolution feature to calculate the accurate result of the object in predicting the coarse position of the small object on the low-resolution feature, fully utilizes the high-resolution feature map and avoids useless calculation of a large amount of background information. Finally, a double detection mechanism is provided to solve the problem of missing detection of small objects. When a single level detector misses a target, a denoising sparse self-encoder (DSAE) module extracts an image of a possible region of the target as a low-dimensional feature vector. Then, according to the results of the two detections, the instances in the image are ordered to identify missing objects.
However, the above methods cannot achieve both tracking accuracy and weight reduction, and therefore, how to improve tracking accuracy and maintain weight reduction of the target tracking method is a problem to be solved.
Disclosure of Invention
In order to solve the problems of the prior art, the embodiment of the invention provides a training method of a target detection model, a target object following method, a device, computer equipment and a storage medium based on YOLOV5, so as to solve the problems of false detection, missed detection, poor detection and the like of small-scale pedestrians existing in target vehicle pedestrian tracking, ensure that the network model can keep good tracking performance, reduce calculation cost and improve pedestrian tracking performance in complex traffic road scenes.
In order to solve one or more of the technical problems, the invention adopts the following technical scheme:
in a first aspect, a method for training a target detection model is provided, the method comprising:
replacing a feature extraction module in the original YOLOV5 model with a Ghost-BottleNeck module to obtain a first YOLOV5 model;
training the first YOLOV5 model by using a training data set to obtain a second YOLOV5 model;
optimizing each parameter of the second YOLOV5 model by using a preset loss function to obtain a target detection model, wherein the preset loss function comprises a GloU loss function and an Alpha-IoU loss function;
the GloU calculation formula is as follows:
Figure BDA0004153840410000031
wherein loU is the overlapping area of the predicted frames, A is the predicted frame, B is the real frame, and C is the minimum bounding box of A and B;
Alpha-loU is calculated as follows:
L α-loU =1-loUα 1 +βα 2 (B,B gt )
wherein alpha 1 is more than 0, alpha 2 is more than 0, B is the size and the position of a real frame, B gt To predict the size and position of the frame, and beta α2 (B,B gt ) Representing the data according to B and B gt A penalty term calculated.
In a specific embodiment, the training the first YOLOV5 model with the training dataset comprises:
inputting an input feature map in the training data set into the first YOLOV5 model, and performing convolution calculation on the input feature map by the Ghost-BottleNeck module in the first YOLOV5 model to generate an original feature map;
performing linear transformation on the original feature map to obtain a Ghost feature map;
and performing splicing processing on the Ghost feature map to obtain an output feature map.
In a specific embodiment, the optimizing each parameter of the second YOLOV5 model by using a preset loss function to obtain a target detection model includes:
step one, calculating a loss value between the output feature map and a real feature map corresponding to the training data set by using the preset loss function;
step two, updating each parameter of the second YOLOV5 model through back propagation according to the loss value;
step three, inputting the input feature map in the training data set into the second YOLOV5 model after parameter updating to obtain a new output feature map;
step four: and repeatedly executing the first step to the third step until the calculated loss value meets a preset condition, and determining the second YOLOV5 model after the last parameter update as a target detection model.
In a second aspect, there is provided a YOLOV 5-based target object following method, the method comprising:
acquiring video data containing a target object to be followed, and slicing the video data by utilizing an SAHI strategy to obtain an input picture;
inputting the input picture into the target detection model obtained by training the training method of the target detection model to extract the characteristics, so as to obtain a predicted characteristic diagram;
screening the prediction feature map by using a preset screening method to obtain a final detection result;
and generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy.
In a specific embodiment, the SAHI strategy includes complete reasoning and helper slicing. .
In a specific embodiment, the prediction feature map includes at least a plurality of candidate detection frames, and the screening the prediction feature map by using a preset screening method includes:
and screening the candidate detection frames in the prediction feature map by using a non-maximum suppression algorithm, and removing the detection frames of which the confidence coefficient of the same object is lower than a preset threshold.
In a specific embodiment, the following object comprises a vehicle, and the preset following algorithm comprises a steering control algorithm and a speed control algorithm;
generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy comprises:
and calculating the steering angular speed and the vehicle speed of a steering engine of the vehicle according to the final detection result and a preset following algorithm, and controlling the vehicle according to the steering angular speed and the vehicle speed.
In a third aspect, there is provided a YOLOV 5-based target object following apparatus, the apparatus comprising:
the acquisition module is used for acquiring video data containing a target object to be followed, and slicing the video data by utilizing an SAHI strategy to obtain an input picture;
the prediction module is used for inputting the input picture into the target detection model obtained by training the training method of the target detection model to perform feature extraction, so as to obtain a prediction feature map;
the screening module is used for screening the prediction feature map by using a preset screening method to obtain a final detection result;
and the control module is used for generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy.
In a fourth aspect, there is also provided a computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the computer program, when executed by the processor, implementing the YOLOV 5-based target object following method.
In a fifth aspect, there is also provided a computer readable storage medium having stored therein a computer program which, when executed, implements the YOLOV 5-based target object following method.
The beneficial effects that technical scheme that this application embodiment provided brought are:
according to the training method of the target detection model, the target object following method, the device, the computer equipment and the storage medium based on the YOLOV5, the problems of false detection, missed detection, poor detection and the like of small-scale pedestrians existing in target vehicle pedestrian tracking are solved. By adopting the Ghost-BottleNeck module, the network model reduces the calculation cost while keeping good tracking performance, optimizes the positioning of vehicle-pedestrian tracking by using Alpha-IoU as a boundary loss function, improves the robustness of vehicle-pedestrian tracking, and improves the vehicle-pedestrian tracking performance under complex traffic road scenes by applying a slice auxiliary super reasoning strategy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating a method of training a target detection model, according to an example embodiment;
FIG. 2 is a flowchart illustrating a yoov 5-based target object following method according to an example embodiment;
FIG. 3 is a schematic diagram of a target object following apparatus based on YOLOV5, according to an exemplary embodiment;
fig. 4 is a schematic diagram of a computer device, according to an example embodiment.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As described in the background art, in the application of the target tracking algorithm in the prior art to target vehicle pedestrian tracking, there are problems of false detection, missed detection, poor detection and the like of the small-scale pedestrian.
In order to solve one or more of the problems, the application creatively provides a new training method of a target detection model and a target object following method based on yolkov 5, wherein a Ghost-BottleNeck module is adopted to replace a feature extraction module in an original yolkov 5 model in the training method, so that the network model keeps good tracking performance, simultaneously reduces calculation cost, optimizes the positioning of vehicle-pedestrian tracking by using Alpha-IoU as a boundary loss function, simultaneously improves the robustness of vehicle-pedestrian tracking, and improves the vehicle-pedestrian tracking performance under complex traffic road scenes by applying a slice auxiliary super reasoning strategy.
The following describes the aspects of the present application in detail with reference to the drawings and various embodiments.
Example 1
In order to implement the solution of the present application, an embodiment of the present application provides a training method of a target detection model, and referring to fig. 1, the method includes the following steps:
s110: replacing a feature extraction module in the original YOLOV5 model with a Ghost-BottleNeck module to obtain a first YOLOV5 model;
s120: training the first YOLOV5 model by using a training data set to obtain a second YOLOV5 model;
s130: and optimizing each parameter of the second YOLOV5 model by using a preset loss function to obtain a target detection model, wherein the preset loss function comprises a GloU loss function and an Alpha-IoU loss function.
In particular, the feature map generated through the backbone network typically contains many duplicates and similar feature maps. To reduce the computational cost of generating a feature map using convolution, a Ghost-BottleNeck module is used in the practice of the present application to generate a feature map.
The accuracy of its prediction block is critical when target detection is concerned. The loss functions adopted in the model training process include a GloU loss function and an Alpha-IoU loss function. The GloU not only inherits the advantage of IoU scale invariance, but also can relieve the problem of gradient disappearance when the prediction frame and the ground truth frame have no overlapping area. Furthermore, alpha-loU was used as a regression loss function, taking into account the effect of noise on detector performance during training.
As a preferred implementation manner, in the embodiment of the present application, the training the first YOLOV5 model using the training data set includes:
inputting an input feature map in the training data set into the first YOLOV5 model, and performing convolution calculation on the input feature map by the Ghost-BottleNeck module in the first YOLOV5 model to generate an original feature map;
performing linear transformation on the original feature map to obtain a Ghost feature map;
and performing splicing processing on the Ghost feature map to obtain an output feature map.
Specifically, in the embodiment of the present application, the number of channels in the original feature map is smaller than the number of channels in the output feature map. When the Ghost feature map is spliced, the Ghost feature map can be spliced according to the channel, and details are not repeated here.
Preferably, when the convolution calculation is performed on the input feature map, the following calculation formula may be adopted in generating the original feature map:
Y′=X*f′
wherein X is input data (i.e. input feature map), denoted as X ε R c×h×w Where c is the number of channels of the input feature map, h is the height of the input feature map, and w is the width of the input feature map; f 'convolution filter denoted as f' e R c′×k×k×m Where c' is the number of channels of the convolution filter, k x k is the convolution kernel, and m is the convolution filter output channel; y 'channel output characteristics, denoted Y' ∈R h′×w′×m′ Where h ' is the output channel height, w ' is the width of the output channel, and m ' is the number of channels of the output signature.
And obtaining the original feature graphs of m' channels through convolution calculation.
In order to compose the required n feature maps, the original feature map of each channel generates a Ghost feature map by linear variation, respectively. Preferably, when the original feature map is subjected to linear transformation to obtain a Ghost feature map, the following calculation formula is adopted:
Figure BDA0004153840410000081
wherein y' i Is the i-th original feature map,
Figure BDA0004153840410000082
representing linear transformation, y ij For the Ghost feature map, s is the total number of channels of the output Ghost feature map, and s=m.
As a preferred implementation manner, in the embodiment of the present application, the optimizing, by using a preset loss function, each parameter of the second YOLOV5 model to obtain the target detection model includes:
step one, calculating a loss value between the output feature map and a real feature map corresponding to the training data set by using the preset loss function;
step two, updating each parameter of the second YOLOV5 model through back propagation according to the loss value;
step three, inputting the input feature map in the training data set into the second YOLOV5 model after parameter updating to obtain a new output feature map;
step four: and repeatedly executing the first step to the third step until the calculated loss value meets a preset condition, and determining the second YOLOV5 model after the last parameter update as a target detection model.
Specifically, in the embodiment of the application, the loss function between the prediction frame and the real frame is calculated by adopting the GloU loss function, so that the accuracy of the position of the target object in the detection result is improved, and the accuracy of the size of the target object in the detection result is improved by adopting Alpha-loU as the regression loss function.
Wherein, the calculation formula of GloU is as follows:
Figure BDA0004153840410000083
where loU is the overlapping area of the predicted frames, A is the predicted frame, B is the real frame, and C is the smallest bounding box of A and B. The accuracy of its prediction block is critical when target detection is concerned. The GIoU not only inherits the advantage of IoU scale invariance, but also can alleviate the problem of gradient disappearance when the prediction frame and the real frame have no overlapping area.
Alpha-loU is based on loU variants, applying a Box-Cox transformation to loU loss function, plus power regularization, with the following calculation formula:
Figure BDA0004153840410000091
in the embodiment of the application, in order to increase the generalization of the loss function, a penalty term is introduced to increase the generalization of the loss function, and the loss function is modified to be:
L α-loU =1-loU α1α2 (B,B gt )
wherein alpha 1 is more than 0, alpha 2 is more than 0, B is the size and the position of a real frame, B gt To predict the size and position of the frame, and beta α2 (B,B gt ) Representing the data according to B and B gt A penalty term calculated. When α1 > 1, the model can be aided in focusing more on objects that are 1oU high.
Example two
Corresponding to the first embodiment, the present application further provides a YOLOV 5-based target object following method, where in the present embodiment, the same or similar content as that of the first embodiment may be referred to the above description, and will not be repeated. Referring to fig. 2, the method comprises the steps of:
s210: acquiring video data containing a target object to be followed, and slicing the video data by utilizing an SAHI strategy to obtain an input picture;
s220: inputting the input picture into the target detection model obtained by training the training method of the target detection model to extract the characteristics, so as to obtain a predicted characteristic diagram;
s230: screening the prediction feature map by using a preset screening method to obtain a final detection result;
s240: and generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy.
In particular, currently existing detectors improve detection performance by changing the structure of the DNN or increasing the network depth. Although the improved detector has good detection effect, the computational complexity and the inference time of the network model can be increased. To solve this problem, in the embodiment of the present application, the SAHI strategy is adopted in the detector inference, so as to improve the detection performance of pedestrians while reducing the complexity and the memory requirement as much as possible. And the detection result list objects is utilized, the objects comprise the types of detected targets, the positions and the probabilities in the image, and the position information is the coordinate points of four vertexes of the rectangular frame of the detected targets in the image, namely the coordinate range of the targets in the image. When the detection result label is judged to be person, namely the pedestrian type, the position difference between the target position and the origin in the image is calculated, and target pedestrian tracking is realized.
It should be noted here that, in the embodiment of the present application, the target object to be followed includes, but is not limited to, a pedestrian, and the following object includes, but is not limited to, a vehicle, that is, the application scenario of the YOLOV 5-based target object following method provided in the present application includes, but is not limited to, vehicle-pedestrian following.
Specifically, the training process of the object detection model may refer to the related content described in the first embodiment, which is not described in detail herein.
As a preferred implementation, in the embodiment of the present application, the SAHI policy includes complete reasoning and assisted slicing.
Specifically, SAHI takes the experience of sliding windows and applies its ideas to the image reasoning process. SAHI is largely divided into two parts, complete reasoning and auxiliary slicing. Complete reasoning is to input the whole image into a reasoning model to detect objects with rich feature information. The auxiliary slice is used for slicing the whole image M multiplied by N times, simultaneously adjusting the size of the image and keeping the aspect ratio, and inputting each image into an inference model respectively for target detection. After the size of the slice image is adjusted, the detail characteristics of pedestrians in the image are more obvious, so that the network model is relatively easy to extract characteristic information.
As a preferred implementation manner, in the embodiment of the present application, the prediction feature map includes at least a plurality of candidate detection frames, and the step of screening the prediction feature map by using a preset screening method includes:
and screening the candidate detection frames in the prediction feature map by using a non-maximum suppression algorithm, and removing the detection frames of which the confidence coefficient of the same object is lower than a preset threshold.
Specifically, the complete reasoning and auxiliary slice calculation result is fed back to the non-maximum value suppression system, a low confidence detection frame of the same object in the prediction feature map is removed, and the original image size is restored. When the method runs, a detection result list containing the type, the position and the confidence of the detection target is returned. Here, it should be noted that, in the embodiment of the present application, the preset threshold is not specifically limited, and the user may set the preset threshold according to actual needs.
As a preferred implementation manner, in the embodiment of the present application, the following object includes a vehicle, and the preset following algorithm includes a steering control algorithm and a speed control algorithm;
generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy comprises:
and calculating the steering angular speed and the vehicle speed of a steering engine of the vehicle according to the final detection result and a preset following algorithm, and controlling the vehicle according to the steering angular speed and the vehicle speed.
Specifically, taking a vehicle-pedestrian following scene as an example, the preset following algorithm in the embodiment of the application mainly comprises a steering control algorithm and a speed control algorithm. After entering the following mode, firstly, initializing a hardware system, including PWM (pulse width modulation) initialization, ultrasonic sensor initialization, related IO (input/output) port initialization and the like, then acquiring an image acquired by a camera, performing target detection, and calculating deviation after detecting a pedestrian target. And acquiring distance information returned by the ultrasonic sensor. And respectively calculating the steering angle and the vehicle speed of a steering engine of the vehicle according to the position information of the pedestrian target and the distance information returned by the ultrasonic sensor, and respectively carrying out steering control and speed control.
Further, the steering control first requires pedestrian target detection. In specific implementation, the object detection model is operated, the types of detected objects, positions in the image and probabilities are contained in the objects in the returned detection result list, and the position information is coordinate points of four vertexes of the rectangular frame of the detected object in the image, namely the coordinate range of the object in the image. And when the detection result label is judged to be person, namely the pedestrian type, calculating the abscissa position deviation. pos_x is the abscissa of the detected object in the image, and when calculating, the horizontal position deviation of the object in the image relative to the image center point can be obtained by calculating the difference between pos_x and the image center point abscissa 160. In order to avoid interference, when the deviation is small, the last deviation value is used. The steering control adopts incremental segmented proportional control, and is closed-loop according to the deviation. The output is a piecewise function of the deviation err. The output control amount is f (x), and x is the deviation err.
Further, the speed control detects the distance between the following target and the vehicle through the ultrasonic sensor, thereby controlling the vehicle speed in real time. The distance of the following target of the vehicle is set as target, and the distance of the ultrasonic feedback is set as distance. The range angle of the ultrasonic sensor is alpha, the angle range is about 15 degrees, and the view angle range of the camera is 120 degrees, so that a situation can occur, when the deviation of the following target is large, the distance information returned by the ultrasonic sensor is not a true value, and therefore, whether the distance measured by the ultrasonic sensor is the distance between the vehicle and the following target needs to be judged in a program. Target is the distance between the set vehicle and the following Target, distance is the distance measured by the ultrasonic sensor, the deviation of the distance and the Target value Target is calculated, and the real-time speed output is calculated. Finally, in the speed control loop, speed control is achieved.
The following describes a target object following method based on YOLOV5 provided in the embodiment of the present application by taking a target object to be followed as a pedestrian and a following object as a moped, that is, taking a scene of the moped and the target pedestrian following as an example:
the first step: and acquiring video data containing pedestrians, and carrying out frame processing on the video data to obtain a plurality of frame pictures serving as input pictures of the model. In specific implementation, the video data may be sliced using the SAHI policy to obtain an input picture. The SAHI strategy consists of two parts, complete reasoning and assisted slicing.
And a second step of: and inputting the input picture obtained in the first step into a target detection model for feature extraction to obtain a predicted feature map, wherein the predicted feature map at least comprises a plurality of candidate detection frames. The target detection model is obtained by training the training method of the target detection model provided in the first embodiment, and the specific training process may refer to the related content described in the first embodiment, which is not described in detail herein.
And a third step of: and screening the prediction feature map by using a preset screening method to obtain a final detection result. In specific implementation, a plurality of candidate detection frames in the prediction feature map are screened by using a non-maximum suppression algorithm, detection frames with confidence of the same object (such as a pedestrian) lower than a preset threshold are removed, and a returned final detection result comprises the type, the position, the confidence and the like of the detection target.
Fourth step: generating a following strategy according to the final detection result and a preset following algorithm, and controlling the power-assisted vehicle according to the following strategy. Specifically, the preset following algorithm mainly comprises a steering control algorithm and a speed control algorithm. And respectively calculating the steering angle and the vehicle speed of a steering engine of the power-assisted vehicle according to the position information of the pedestrians, the distance information returned by the ultrasonic sensor and the like, and respectively carrying out steering control and speed control on the power-assisted vehicle.
Example III
Corresponding to the second embodiment, the present application further provides a YOLOV 5-based target object following device, where in this embodiment, the same or similar content as that of the embodiment may be referred to the above description, and will not be repeated. Referring to fig. 3, the apparatus includes:
the acquisition module is used for acquiring video data containing a target object to be followed, and slicing the video data by utilizing an SAHI strategy to obtain an input picture;
the prediction module is used for inputting the input picture into the target detection model obtained by training the training method of the target detection model to perform feature extraction, so as to obtain a prediction feature map;
the screening module is used for screening the prediction feature map by using a preset screening method to obtain a final detection result;
and the control module is used for generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy.
As a preferred implementation, in the embodiment of the present application, the SAHI policy includes complete reasoning and assisted slicing.
In an embodiment of the present application, the prediction feature map includes at least a plurality of candidate detection frames, and the screening module is specifically configured to: and screening the candidate detection frames in the prediction feature map by using a non-maximum suppression algorithm, and removing the detection frames of which the confidence coefficient of the same object is lower than a preset threshold.
As a preferred implementation manner, in the embodiment of the present application, the following object includes a vehicle, and the preset following algorithm includes a steering control algorithm and a speed control algorithm;
the control module is specifically configured to calculate a steering angular speed and a vehicle speed of a steering engine of the vehicle according to the final detection result and a preset following algorithm, and control the vehicle according to the steering angular speed and the vehicle speed.
Example IV
Corresponding to the second or third embodiment, the present application further provides a computer device, including: a processor and a memory, the memory storing a computer program executable on the processor, which when executed by the processor, performs the YOLOV 5-based target object following method provided by any one of the embodiments described above.
FIG. 4 illustrates a computer device 1500, which may include, inter alia, a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 may be communicatively connected by a communication bus 1530.
The processor 1510 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing related programs to implement the technical scheme provided by the present invention.
The Memory 1520 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), a static storage device, a dynamic storage device, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the electronic device, a Basic Input Output System (BIOS) for controlling low-level operation of the electronic device. In addition, a web browser 1523, a data storage management system 1524, a device identification information processing system 1525, and the like may also be stored. The device identification information processing system 1525 may be an application program that implements the operations of the steps described above in embodiments of the present invention. In general, when the present invention is implemented in software or firmware, the relevant program code is stored in the memory 1520 and executed by the processor 1510.
The input/output interface 1513 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The network interface 1514 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
The bus includes a path to transfer information between various components of the device (e.g., the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).
In addition, the electronic device may also obtain information of specific acquisition conditions from the virtual resource object acquisition condition information database, so as to be used for performing condition judgment, and the like.
It is noted that although the above devices illustrate only the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus, etc., in particular implementations, the device may include other components necessary to achieve proper functioning. Furthermore, it will be appreciated by those skilled in the art that the apparatus may include only the components necessary to implement the present invention, and not all of the components shown in the drawings.
Example five
The second to fourth embodiments correspond to the second to fourth embodiments, and the present application further provides a computer readable storage medium, where in the present embodiment, the same or similar content as the second to fourth embodiments may be referred to the above description, and the following description is omitted.
The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the YOLOV 5-based target object following method as described above.
In some implementations, when the computer program is executed by the processor in the embodiments of the present application, the steps corresponding to the method described in the second embodiment may be further implemented, and reference may be made to the detailed description in the second embodiment, which is not repeated herein.
From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing has outlined the more detailed description of the preferred embodiment of the present invention and is provided herein as a detailed description of the principles and embodiments of the present invention with the use of specific examples, the above examples being provided for the purpose of facilitating the understanding of the method of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (10)

1. A method of training a target detection model, the method comprising:
replacing a feature extraction module in the original YOLOV5 model with a Ghost-BottleNeck module to obtain a first YOLOV5 model;
training the first YOLOV5 model by using a training data set to obtain a second YOLOV5 model;
optimizing each parameter of the second YOLOV5 model by using a preset loss function to obtain a target detection model, wherein the preset loss function comprises a GloU loss function and an Alpha-IoU loss function;
the GloU calculation formula is as follows:
Figure FDA0004153840400000011
wherein loU is the overlapping area of the predicted frames, A is the predicted frame, B is the real frame, and C is the minimum bounding box of A and B;
Alpha-loU is calculated as follows:
L α-loU =1-loU α1α2 (B,B gt )
wherein alpha 1 is more than 0, alpha 2 is more than 0, B is the size and the position of a real frame, B gt To predict the size and position of the frame, and beta α2 (B,B gt ) Representing the data according to B and B gt A penalty term calculated.
2. The method of training the object detection model of claim 1, wherein training the first YOLOV5 model using a training dataset comprises:
inputting an input feature map in the training data set into the first YOLOV5 model, and performing convolution calculation on the input feature map by the Ghost-BottleNeck module in the first YOLOV5 model to generate an original feature map;
performing linear transformation on the original feature map to obtain a Ghost feature map;
and performing splicing processing on the Ghost feature map to obtain an output feature map.
3. The method for training the object detection model according to claim 2, wherein optimizing each parameter of the second YOLOV5 model by using a preset loss function to obtain the object detection model comprises:
step one, calculating a loss value between the output feature map and a real feature map corresponding to the training data set by using the preset loss function;
step two, updating each parameter of the second YOLOV5 model through back propagation according to the loss value;
step three, inputting the input feature map in the training data set into the second YOLOV5 model after parameter updating to obtain a new output feature map;
step four: and repeatedly executing the first step to the third step until the calculated loss value meets a preset condition, and determining the second YOLOV5 model after the last parameter update as a target detection model.
4. A YOLOV 5-based target object following method, the method comprising:
acquiring video data containing a target object to be followed, and slicing the video data by utilizing an SAHI strategy to obtain an input picture;
inputting the input picture into a target detection model obtained by training the training method of the target detection model according to any one of claims 1 to 3 for feature extraction to obtain a predicted feature map;
screening the prediction feature map by using a preset screening method to obtain a final detection result;
and generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy.
5. The YOLOV 5-based target object following method of claim 4, wherein the SAHI strategy comprises complete reasoning and helper slicing.
6. The YOLOV 5-based target object following method of claim 4, wherein the prediction feature map at least includes a plurality of candidate detection frames, and the screening the prediction feature map by using a preset screening method includes:
and screening the candidate detection frames in the prediction feature map by using a non-maximum suppression algorithm, and removing the detection frames of which the confidence coefficient of the same object is lower than a preset threshold.
7. The YOLOV 5-based target object following method of claim 4, wherein the following object comprises a vehicle, and the preset following algorithm comprises a steering control algorithm and a speed control algorithm;
generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy comprises:
and calculating the steering angular speed and the vehicle speed of a steering engine of the vehicle according to the final detection result and a preset following algorithm, and controlling the vehicle according to the steering angular speed and the vehicle speed.
8. A YOLOV 5-based target object following apparatus, the apparatus comprising:
the acquisition module is used for acquiring video data containing a target object to be followed, and slicing the video data by utilizing an SAHI strategy to obtain an input picture;
the prediction module is used for inputting the input picture into the target detection model obtained by training the training method of the target detection model according to any one of claims 1 to 3 to perform feature extraction so as to obtain a prediction feature map;
the screening module is used for screening the prediction feature map by using a preset screening method to obtain a final detection result;
and the control module is used for generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy.
9. A computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, which when executed by the processor, implements the yoov 5-based target object following method of any one of claims 4 to 7.
10. A computer-readable storage medium having a computer program stored therein, wherein the computer program, when executed, implements the YOLOV 5-based target object following method of any one of claims 4 to 7.
CN202310327802.2A 2023-03-30 2023-03-30 YOLOV 5-based target object following method and device Pending CN116402853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310327802.2A CN116402853A (en) 2023-03-30 2023-03-30 YOLOV 5-based target object following method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310327802.2A CN116402853A (en) 2023-03-30 2023-03-30 YOLOV 5-based target object following method and device

Publications (1)

Publication Number Publication Date
CN116402853A true CN116402853A (en) 2023-07-07

Family

ID=87017161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310327802.2A Pending CN116402853A (en) 2023-03-30 2023-03-30 YOLOV 5-based target object following method and device

Country Status (1)

Country Link
CN (1) CN116402853A (en)

Similar Documents

Publication Publication Date Title
KR102292277B1 (en) LIDAR localization inferring solutions using 3D CNN networks in autonomous vehicles
KR102335389B1 (en) Deep Learning-Based Feature Extraction for LIDAR Position Estimation of Autonomous Vehicles
KR102350181B1 (en) LIDAR Position Estimation Using RNN and LSTM to Perform Temporal Smoothing in Autonomous Vehicles
US10832478B2 (en) Method and system for virtual sensor data generation with depth ground truth annotation
CN113128348B (en) Laser radar target detection method and system integrating semantic information
CN114144809A (en) Vehicle environment modeling by camera
CN110827320B (en) Target tracking method and device based on time sequence prediction
KR102095152B1 (en) A method of recognizing a situation and apparatus performing the same
CN110992424B (en) Positioning method and system based on binocular vision
KR20200043005A (en) Method and device to train image recognition model and to recognize image
WO2023016182A1 (en) Pose determination method and apparatus, electronic device, and readable storage medium
CN115830265A (en) Automatic driving movement obstacle segmentation method based on laser radar
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN113177432A (en) Head pose estimation method, system, device and medium based on multi-scale lightweight network
CN112258565B (en) Image processing method and device
CN110864670B (en) Method and system for acquiring position of target obstacle
CN106408593A (en) Video-based vehicle tracking method and device
CN113673288B (en) Idle parking space detection method and device, computer equipment and storage medium
CN117132649A (en) Ship video positioning method and device for artificial intelligent Beidou satellite navigation fusion
Jo et al. Mixture density-PoseNet and its application to monocular camera-based global localization
US20220366706A1 (en) Vehicle environment modeling with a camera
CN116402853A (en) YOLOV 5-based target object following method and device
CN114022630A (en) Method, device and equipment for reconstructing three-dimensional scene and computer readable storage medium
CN116654022B (en) Pedestrian track prediction method, system, equipment and medium based on multiple interactions
US20240078684A1 (en) Global motion modeling for automotive image data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination