CN116402853A

CN116402853A - YOLOV 5-based target object following method and device

Info

Publication number: CN116402853A
Application number: CN202310327802.2A
Authority: CN
Inventors: 张泽东; 庞艳军; 李鑫
Original assignee: Suzhou Topkrypton Technology Co ltd
Current assignee: Suzhou Topkrypton Technology Co ltd
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-07-07

Abstract

The embodiment of the application provides a training method of a target detection model, a target object following method, a device, computer equipment and a storage medium based on YOLOV5, and solves the problems of false detection, missed detection, poor detection and the like of small-scale pedestrians in target vehicle pedestrian tracking. By adopting the Ghost-BottleNeck module, the network model reduces the calculation cost while keeping good tracking performance, optimizes the positioning of vehicle-pedestrian tracking by using Alpha-IoU as a boundary loss function, improves the robustness of vehicle-pedestrian tracking, and improves the vehicle-pedestrian tracking performance under complex traffic road scenes by applying a slice auxiliary super reasoning strategy.

Description

YOLOV 5-based target object following method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a target object following method and device based on YOLOV 5.

Background

The object tracking technology plays a vital role in computer vision because it has a variety of uses in applications such as security systems, pedestrian re-recognition, pedestrian tracking, and pedestrian intent prediction. With the development of intelligent automobiles, pedestrian detection has become a key technology for target detection. In addition, the rapid and accurate pedestrian detection method has important significance for safety of intelligent vehicles on roads and safety protection of pedestrians. In intelligent vehicles, target detection generally employs lighter weight tracking methods due to limitations in the computing power of the vehicle computing devices.

In a complex traffic road scene, the lightweight target tracking method obtains good confidence scores for large and medium-scale pedestrians. It cannot track small pedestrians at a far distance and has missed and erroneous tracks.

On the one hand, aiming at the problems of improving the small target tracking effect and robustness in complex scenes, effective methods are proposed at present. For example, an Extended Feature Pyramid Network (EFPN) is provided on the basis of a Feature Pyramid Network (FPN), and the Extended Feature Pyramid Network (EFPN) utilizes super-resolution (SR) features as a new feature transmission module in the FPN, so that detail features in an area are enriched, and small and medium targets can be tracked conveniently. In addition, DNN (deep neural network) weights of the network layer can be divided into a plurality of equal-sized blocks, and weights within the blocks can be trimmed to the same shape. Or, a mobile GPU-CPU cooperative scheme can be adopted, so that the detection method deployed on the mobile device can keep good detection precision and achieve high-efficiency reasoning speed. Or integrating the improved Spatial Pyramid Pooling (SPP) layer into the transverse connection of the FPN to better extract fine-grained information from the shallow feature map and improve the target detection accuracy of the unmanned aerial vehicle. Or a global-local feature enhancement network (GLF-Net) is adopted, and in the feature extraction process, a Local Feature Extraction (LFE) module and a Global Feature Extraction (GFE) module respectively extract local features and global features of an image, so that stable feature extraction under a complex background and a dense scene is realized, and a feature fusion module is responsible for fusing the global features and the local features, so that the feature representation capability of a network model is enhanced, and the detection precision of a multi-scale target is improved.

On the other hand, in order to balance the problem of small object detection performance and efficiency, an object detector is currently proposed that can infer feature pyramid based more quickly. The method guides the high-resolution feature to calculate the accurate result of the object in predicting the coarse position of the small object on the low-resolution feature, fully utilizes the high-resolution feature map and avoids useless calculation of a large amount of background information. Finally, a double detection mechanism is provided to solve the problem of missing detection of small objects. When a single level detector misses a target, a denoising sparse self-encoder (DSAE) module extracts an image of a possible region of the target as a low-dimensional feature vector. Then, according to the results of the two detections, the instances in the image are ordered to identify missing objects.

However, the above methods cannot achieve both tracking accuracy and weight reduction, and therefore, how to improve tracking accuracy and maintain weight reduction of the target tracking method is a problem to be solved.

Disclosure of Invention

In order to solve the problems of the prior art, the embodiment of the invention provides a training method of a target detection model, a target object following method, a device, computer equipment and a storage medium based on YOLOV5, so as to solve the problems of false detection, missed detection, poor detection and the like of small-scale pedestrians existing in target vehicle pedestrian tracking, ensure that the network model can keep good tracking performance, reduce calculation cost and improve pedestrian tracking performance in complex traffic road scenes.

In order to solve one or more of the technical problems, the invention adopts the following technical scheme:

in a first aspect, a method for training a target detection model is provided, the method comprising:

replacing a feature extraction module in the original YOLOV5 model with a Ghost-BottleNeck module to obtain a first YOLOV5 model;

training the first YOLOV5 model by using a training data set to obtain a second YOLOV5 model;

optimizing each parameter of the second YOLOV5 model by using a preset loss function to obtain a target detection model, wherein the preset loss function comprises a GloU loss function and an Alpha-IoU loss function;

the GloU calculation formula is as follows:

wherein loU is the overlapping area of the predicted frames, A is the predicted frame, B is the real frame, and C is the minimum bounding box of A and B;

Alpha-loU is calculated as follows:

L _α-loU ＝1-loUα ¹ +βα ² (B，B ^gt )

wherein alpha 1 is more than 0, alpha 2 is more than 0, B is the size and the position of a real frame, B ^gt To predict the size and position of the frame, and beta ^α2 (B，B ^gt ) Representing the data according to B and B ^gt A penalty term calculated.

In a specific embodiment, the training the first YOLOV5 model with the training dataset comprises:

inputting an input feature map in the training data set into the first YOLOV5 model, and performing convolution calculation on the input feature map by the Ghost-BottleNeck module in the first YOLOV5 model to generate an original feature map;

performing linear transformation on the original feature map to obtain a Ghost feature map;

and performing splicing processing on the Ghost feature map to obtain an output feature map.

In a specific embodiment, the optimizing each parameter of the second YOLOV5 model by using a preset loss function to obtain a target detection model includes:

step one, calculating a loss value between the output feature map and a real feature map corresponding to the training data set by using the preset loss function;

step two, updating each parameter of the second YOLOV5 model through back propagation according to the loss value;

step three, inputting the input feature map in the training data set into the second YOLOV5 model after parameter updating to obtain a new output feature map;

step four: and repeatedly executing the first step to the third step until the calculated loss value meets a preset condition, and determining the second YOLOV5 model after the last parameter update as a target detection model.

In a second aspect, there is provided a YOLOV 5-based target object following method, the method comprising:

acquiring video data containing a target object to be followed, and slicing the video data by utilizing an SAHI strategy to obtain an input picture;

inputting the input picture into the target detection model obtained by training the training method of the target detection model to extract the characteristics, so as to obtain a predicted characteristic diagram;

screening the prediction feature map by using a preset screening method to obtain a final detection result;

and generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy.

In a specific embodiment, the SAHI strategy includes complete reasoning and helper slicing. .

In a specific embodiment, the prediction feature map includes at least a plurality of candidate detection frames, and the screening the prediction feature map by using a preset screening method includes:

and screening the candidate detection frames in the prediction feature map by using a non-maximum suppression algorithm, and removing the detection frames of which the confidence coefficient of the same object is lower than a preset threshold.

In a specific embodiment, the following object comprises a vehicle, and the preset following algorithm comprises a steering control algorithm and a speed control algorithm;

generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy comprises:

and calculating the steering angular speed and the vehicle speed of a steering engine of the vehicle according to the final detection result and a preset following algorithm, and controlling the vehicle according to the steering angular speed and the vehicle speed.

In a third aspect, there is provided a YOLOV 5-based target object following apparatus, the apparatus comprising:

the acquisition module is used for acquiring video data containing a target object to be followed, and slicing the video data by utilizing an SAHI strategy to obtain an input picture;

the prediction module is used for inputting the input picture into the target detection model obtained by training the training method of the target detection model to perform feature extraction, so as to obtain a prediction feature map;

the screening module is used for screening the prediction feature map by using a preset screening method to obtain a final detection result;

and the control module is used for generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy.

In a fourth aspect, there is also provided a computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the computer program, when executed by the processor, implementing the YOLOV 5-based target object following method.

In a fifth aspect, there is also provided a computer readable storage medium having stored therein a computer program which, when executed, implements the YOLOV 5-based target object following method.

The beneficial effects that technical scheme that this application embodiment provided brought are:

according to the training method of the target detection model, the target object following method, the device, the computer equipment and the storage medium based on the YOLOV5, the problems of false detection, missed detection, poor detection and the like of small-scale pedestrians existing in target vehicle pedestrian tracking are solved. By adopting the Ghost-BottleNeck module, the network model reduces the calculation cost while keeping good tracking performance, optimizes the positioning of vehicle-pedestrian tracking by using Alpha-IoU as a boundary loss function, improves the robustness of vehicle-pedestrian tracking, and improves the vehicle-pedestrian tracking performance under complex traffic road scenes by applying a slice auxiliary super reasoning strategy.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart illustrating a method of training a target detection model, according to an example embodiment;

FIG. 2 is a flowchart illustrating a yoov 5-based target object following method according to an example embodiment;

FIG. 3 is a schematic diagram of a target object following apparatus based on YOLOV5, according to an exemplary embodiment;

fig. 4 is a schematic diagram of a computer device, according to an example embodiment.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As described in the background art, in the application of the target tracking algorithm in the prior art to target vehicle pedestrian tracking, there are problems of false detection, missed detection, poor detection and the like of the small-scale pedestrian.

In order to solve one or more of the problems, the application creatively provides a new training method of a target detection model and a target object following method based on yolkov 5, wherein a Ghost-BottleNeck module is adopted to replace a feature extraction module in an original yolkov 5 model in the training method, so that the network model keeps good tracking performance, simultaneously reduces calculation cost, optimizes the positioning of vehicle-pedestrian tracking by using Alpha-IoU as a boundary loss function, simultaneously improves the robustness of vehicle-pedestrian tracking, and improves the vehicle-pedestrian tracking performance under complex traffic road scenes by applying a slice auxiliary super reasoning strategy.

The following describes the aspects of the present application in detail with reference to the drawings and various embodiments.

Example 1

In order to implement the solution of the present application, an embodiment of the present application provides a training method of a target detection model, and referring to fig. 1, the method includes the following steps:

s110: replacing a feature extraction module in the original YOLOV5 model with a Ghost-BottleNeck module to obtain a first YOLOV5 model;

s120: training the first YOLOV5 model by using a training data set to obtain a second YOLOV5 model;

s130: and optimizing each parameter of the second YOLOV5 model by using a preset loss function to obtain a target detection model, wherein the preset loss function comprises a GloU loss function and an Alpha-IoU loss function.

In particular, the feature map generated through the backbone network typically contains many duplicates and similar feature maps. To reduce the computational cost of generating a feature map using convolution, a Ghost-BottleNeck module is used in the practice of the present application to generate a feature map.

The accuracy of its prediction block is critical when target detection is concerned. The loss functions adopted in the model training process include a GloU loss function and an Alpha-IoU loss function. The GloU not only inherits the advantage of IoU scale invariance, but also can relieve the problem of gradient disappearance when the prediction frame and the ground truth frame have no overlapping area. Furthermore, alpha-loU was used as a regression loss function, taking into account the effect of noise on detector performance during training.

As a preferred implementation manner, in the embodiment of the present application, the training the first YOLOV5 model using the training data set includes:

Specifically, in the embodiment of the present application, the number of channels in the original feature map is smaller than the number of channels in the output feature map. When the Ghost feature map is spliced, the Ghost feature map can be spliced according to the channel, and details are not repeated here.

Preferably, when the convolution calculation is performed on the input feature map, the following calculation formula may be adopted in generating the original feature map:

Y′＝X*f′

wherein X is input data (i.e. input feature map), denoted as X ε R ^c×h×w Where c is the number of channels of the input feature map, h is the height of the input feature map, and w is the width of the input feature map; f 'convolution filter denoted as f' e R ^{c′×k×k×m} Where c' is the number of channels of the convolution filter, k x k is the convolution kernel, and m is the convolution filter output channel; y 'channel output characteristics, denoted Y' ∈R ^{h′×w′×m′} Where h ' is the output channel height, w ' is the width of the output channel, and m ' is the number of channels of the output signature.

And obtaining the original feature graphs of m' channels through convolution calculation.

In order to compose the required n feature maps, the original feature map of each channel generates a Ghost feature map by linear variation, respectively. Preferably, when the original feature map is subjected to linear transformation to obtain a Ghost feature map, the following calculation formula is adopted:

wherein y' _i Is the i-th original feature map,

representing linear transformation, y _ij For the Ghost feature map, s is the total number of channels of the output Ghost feature map, and s=m.

As a preferred implementation manner, in the embodiment of the present application, the optimizing, by using a preset loss function, each parameter of the second YOLOV5 model to obtain the target detection model includes:

Specifically, in the embodiment of the application, the loss function between the prediction frame and the real frame is calculated by adopting the GloU loss function, so that the accuracy of the position of the target object in the detection result is improved, and the accuracy of the size of the target object in the detection result is improved by adopting Alpha-loU as the regression loss function.

Wherein, the calculation formula of GloU is as follows:

where loU is the overlapping area of the predicted frames, A is the predicted frame, B is the real frame, and C is the smallest bounding box of A and B. The accuracy of its prediction block is critical when target detection is concerned. The GIoU not only inherits the advantage of IoU scale invariance, but also can alleviate the problem of gradient disappearance when the prediction frame and the real frame have no overlapping area.

Alpha-loU is based on loU variants, applying a Box-Cox transformation to loU loss function, plus power regularization, with the following calculation formula:

in the embodiment of the application, in order to increase the generalization of the loss function, a penalty term is introduced to increase the generalization of the loss function, and the loss function is modified to be:

L _α-loU ＝1-loU ^α1 +β ^α2 (B，B ^gt )

wherein alpha 1 is more than 0, alpha 2 is more than 0, B is the size and the position of a real frame, B ^gt To predict the size and position of the frame, and beta ^α2 (B，B ^gt ) Representing the data according to B and B ^gt A penalty term calculated. When α1 > 1, the model can be aided in focusing more on objects that are 1oU high.

Example two

Corresponding to the first embodiment, the present application further provides a YOLOV 5-based target object following method, where in the present embodiment, the same or similar content as that of the first embodiment may be referred to the above description, and will not be repeated. Referring to fig. 2, the method comprises the steps of:

s210: acquiring video data containing a target object to be followed, and slicing the video data by utilizing an SAHI strategy to obtain an input picture;

s220: inputting the input picture into the target detection model obtained by training the training method of the target detection model to extract the characteristics, so as to obtain a predicted characteristic diagram;

s230: screening the prediction feature map by using a preset screening method to obtain a final detection result;

s240: and generating a following strategy according to the final detection result and a preset following algorithm, and controlling a following object according to the following strategy.

In particular, currently existing detectors improve detection performance by changing the structure of the DNN or increasing the network depth. Although the improved detector has good detection effect, the computational complexity and the inference time of the network model can be increased. To solve this problem, in the embodiment of the present application, the SAHI strategy is adopted in the detector inference, so as to improve the detection performance of pedestrians while reducing the complexity and the memory requirement as much as possible. And the detection result list objects is utilized, the objects comprise the types of detected targets, the positions and the probabilities in the image, and the position information is the coordinate points of four vertexes of the rectangular frame of the detected targets in the image, namely the coordinate range of the targets in the image. When the detection result label is judged to be person, namely the pedestrian type, the position difference between the target position and the origin in the image is calculated, and target pedestrian tracking is realized.

It should be noted here that, in the embodiment of the present application, the target object to be followed includes, but is not limited to, a pedestrian, and the following object includes, but is not limited to, a vehicle, that is, the application scenario of the YOLOV 5-based target object following method provided in the present application includes, but is not limited to, vehicle-pedestrian following.

Specifically, the training process of the object detection model may refer to the related content described in the first embodiment, which is not described in detail herein.

As a preferred implementation, in the embodiment of the present application, the SAHI policy includes complete reasoning and assisted slicing.

Specifically, SAHI takes the experience of sliding windows and applies its ideas to the image reasoning process. SAHI is largely divided into two parts, complete reasoning and auxiliary slicing. Complete reasoning is to input the whole image into a reasoning model to detect objects with rich feature information. The auxiliary slice is used for slicing the whole image M multiplied by N times, simultaneously adjusting the size of the image and keeping the aspect ratio, and inputting each image into an inference model respectively for target detection. After the size of the slice image is adjusted, the detail characteristics of pedestrians in the image are more obvious, so that the network model is relatively easy to extract characteristic information.

As a preferred implementation manner, in the embodiment of the present application, the prediction feature map includes at least a plurality of candidate detection frames, and the step of screening the prediction feature map by using a preset screening method includes:

Specifically, the complete reasoning and auxiliary slice calculation result is fed back to the non-maximum value suppression system, a low confidence detection frame of the same object in the prediction feature map is removed, and the original image size is restored. When the method runs, a detection result list containing the type, the position and the confidence of the detection target is returned. Here, it should be noted that, in the embodiment of the present application, the preset threshold is not specifically limited, and the user may set the preset threshold according to actual needs.

As a preferred implementation manner, in the embodiment of the present application, the following object includes a vehicle, and the preset following algorithm includes a steering control algorithm and a speed control algorithm;

Specifically, taking a vehicle-pedestrian following scene as an example, the preset following algorithm in the embodiment of the application mainly comprises a steering control algorithm and a speed control algorithm. After entering the following mode, firstly, initializing a hardware system, including PWM (pulse width modulation) initialization, ultrasonic sensor initialization, related IO (input/output) port initialization and the like, then acquiring an image acquired by a camera, performing target detection, and calculating deviation after detecting a pedestrian target. And acquiring distance information returned by the ultrasonic sensor. And respectively calculating the steering angle and the vehicle speed of a steering engine of the vehicle according to the position information of the pedestrian target and the distance information returned by the ultrasonic sensor, and respectively carrying out steering control and speed control.

Further, the steering control first requires pedestrian target detection. In specific implementation, the object detection model is operated, the types of detected objects, positions in the image and probabilities are contained in the objects in the returned detection result list, and the position information is coordinate points of four vertexes of the rectangular frame of the detected object in the image, namely the coordinate range of the object in the image. And when the detection result label is judged to be person, namely the pedestrian type, calculating the abscissa position deviation. pos_x is the abscissa of the detected object in the image, and when calculating, the horizontal position deviation of the object in the image relative to the image center point can be obtained by calculating the difference between pos_x and the image center point abscissa 160. In order to avoid interference, when the deviation is small, the last deviation value is used. The steering control adopts incremental segmented proportional control, and is closed-loop according to the deviation. The output is a piecewise function of the deviation err. The output control amount is f (x), and x is the deviation err.

Further, the speed control detects the distance between the following target and the vehicle through the ultrasonic sensor, thereby controlling the vehicle speed in real time. The distance of the following target of the vehicle is set as target, and the distance of the ultrasonic feedback is set as distance. The range angle of the ultrasonic sensor is alpha, the angle range is about 15 degrees, and the view angle range of the camera is 120 degrees, so that a situation can occur, when the deviation of the following target is large, the distance information returned by the ultrasonic sensor is not a true value, and therefore, whether the distance measured by the ultrasonic sensor is the distance between the vehicle and the following target needs to be judged in a program. Target is the distance between the set vehicle and the following Target, distance is the distance measured by the ultrasonic sensor, the deviation of the distance and the Target value Target is calculated, and the real-time speed output is calculated. Finally, in the speed control loop, speed control is achieved.

The following describes a target object following method based on YOLOV5 provided in the embodiment of the present application by taking a target object to be followed as a pedestrian and a following object as a moped, that is, taking a scene of the moped and the target pedestrian following as an example:

the first step: and acquiring video data containing pedestrians, and carrying out frame processing on the video data to obtain a plurality of frame pictures serving as input pictures of the model. In specific implementation, the video data may be sliced using the SAHI policy to obtain an input picture. The SAHI strategy consists of two parts, complete reasoning and assisted slicing.

And a second step of: and inputting the input picture obtained in the first step into a target detection model for feature extraction to obtain a predicted feature map, wherein the predicted feature map at least comprises a plurality of candidate detection frames. The target detection model is obtained by training the training method of the target detection model provided in the first embodiment, and the specific training process may refer to the related content described in the first embodiment, which is not described in detail herein.

And a third step of: and screening the prediction feature map by using a preset screening method to obtain a final detection result. In specific implementation, a plurality of candidate detection frames in the prediction feature map are screened by using a non-maximum suppression algorithm, detection frames with confidence of the same object (such as a pedestrian) lower than a preset threshold are removed, and a returned final detection result comprises the type, the position, the confidence and the like of the detection target.

Fourth step: generating a following strategy according to the final detection result and a preset following algorithm, and controlling the power-assisted vehicle according to the following strategy. Specifically, the preset following algorithm mainly comprises a steering control algorithm and a speed control algorithm. And respectively calculating the steering angle and the vehicle speed of a steering engine of the power-assisted vehicle according to the position information of the pedestrians, the distance information returned by the ultrasonic sensor and the like, and respectively carrying out steering control and speed control on the power-assisted vehicle.

Example III

Corresponding to the second embodiment, the present application further provides a YOLOV 5-based target object following device, where in this embodiment, the same or similar content as that of the embodiment may be referred to the above description, and will not be repeated. Referring to fig. 3, the apparatus includes:

In an embodiment of the present application, the prediction feature map includes at least a plurality of candidate detection frames, and the screening module is specifically configured to: and screening the candidate detection frames in the prediction feature map by using a non-maximum suppression algorithm, and removing the detection frames of which the confidence coefficient of the same object is lower than a preset threshold.

the control module is specifically configured to calculate a steering angular speed and a vehicle speed of a steering engine of the vehicle according to the final detection result and a preset following algorithm, and control the vehicle according to the steering angular speed and the vehicle speed.

Example IV

Corresponding to the second or third embodiment, the present application further provides a computer device, including: a processor and a memory, the memory storing a computer program executable on the processor, which when executed by the processor, performs the YOLOV 5-based target object following method provided by any one of the embodiments described above.

FIG. 4 illustrates a computer device 1500, which may include, inter alia, a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 may be communicatively connected by a communication bus 1530.

The processor 1510 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing related programs to implement the technical scheme provided by the present invention.

The Memory 1520 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), a static storage device, a dynamic storage device, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the electronic device, a Basic Input Output System (BIOS) for controlling low-level operation of the electronic device. In addition, a web browser 1523, a data storage management system 1524, a device identification information processing system 1525, and the like may also be stored. The device identification information processing system 1525 may be an application program that implements the operations of the steps described above in embodiments of the present invention. In general, when the present invention is implemented in software or firmware, the relevant program code is stored in the memory 1520 and executed by the processor 1510.

The input/output interface 1513 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The network interface 1514 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

The bus includes a path to transfer information between various components of the device (e.g., the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).

In addition, the electronic device may also obtain information of specific acquisition conditions from the virtual resource object acquisition condition information database, so as to be used for performing condition judgment, and the like.

It is noted that although the above devices illustrate only the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus, etc., in particular implementations, the device may include other components necessary to achieve proper functioning. Furthermore, it will be appreciated by those skilled in the art that the apparatus may include only the components necessary to implement the present invention, and not all of the components shown in the drawings.

Example five

The second to fourth embodiments correspond to the second to fourth embodiments, and the present application further provides a computer readable storage medium, where in the present embodiment, the same or similar content as the second to fourth embodiments may be referred to the above description, and the following description is omitted.

The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the YOLOV 5-based target object following method as described above.

In some implementations, when the computer program is executed by the processor in the embodiments of the present application, the steps corresponding to the method described in the second embodiment may be further implemented, and reference may be made to the detailed description in the second embodiment, which is not repeated herein.

From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing has outlined the more detailed description of the preferred embodiment of the present invention and is provided herein as a detailed description of the principles and embodiments of the present invention with the use of specific examples, the above examples being provided for the purpose of facilitating the understanding of the method of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. A method of training a target detection model, the method comprising:

the GloU calculation formula is as follows:

Alpha-loU is calculated as follows:

L _α-loU ＝1-loU ^α1 +β ^α2 (B，B ^gt )

2. The method of training the object detection model of claim 1, wherein training the first YOLOV5 model using a training dataset comprises:

3. The method for training the object detection model according to claim 2, wherein optimizing each parameter of the second YOLOV5 model by using a preset loss function to obtain the object detection model comprises:

4. A YOLOV 5-based target object following method, the method comprising:

inputting the input picture into a target detection model obtained by training the training method of the target detection model according to any one of claims 1 to 3 for feature extraction to obtain a predicted feature map;

5. The YOLOV 5-based target object following method of claim 4, wherein the SAHI strategy comprises complete reasoning and helper slicing.

6. The YOLOV 5-based target object following method of claim 4, wherein the prediction feature map at least includes a plurality of candidate detection frames, and the screening the prediction feature map by using a preset screening method includes:

7. The YOLOV 5-based target object following method of claim 4, wherein the following object comprises a vehicle, and the preset following algorithm comprises a steering control algorithm and a speed control algorithm;

8. A YOLOV 5-based target object following apparatus, the apparatus comprising:

the prediction module is used for inputting the input picture into the target detection model obtained by training the training method of the target detection model according to any one of claims 1 to 3 to perform feature extraction so as to obtain a prediction feature map;

9. A computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, which when executed by the processor, implements the yoov 5-based target object following method of any one of claims 4 to 7.

10. A computer-readable storage medium having a computer program stored therein, wherein the computer program, when executed, implements the YOLOV 5-based target object following method of any one of claims 4 to 7.