CN115273017A

CN115273017A - Traffic sign detection recognition model training method and system based on Yolov5

Info

Publication number: CN115273017A
Application number: CN202210472638.XA
Authority: CN
Inventors: 王涛; 刘承堃; 程瑞; 徐奇
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-11-01

Abstract

The invention discloses a Yolov 5-based traffic sign detection and recognition model training method and a system, wherein the method carries out Yolov 5-based traffic sign detection and recognition model training and iterative processing on an obtained traffic sign data set, and finally obtains a training sequence of a recognition model. The advantages are that: the accuracy rate of the model for detecting the traffic sign board is improved on the basis of ensuring the target detection speed and not greatly increasing the parameter quantity of the model.

Description

Traffic sign detection recognition model training method and system based on Yolov5

Technical Field

The invention relates to the technical field of traffic sign recognition, in particular to a Yolov 5-based traffic sign detection recognition model training method and system.

Background

Today, the field of automated driving technology has achieved a number of breakthrough results in the development of intelligent traffic, for example, intelligent vision systems have realized a qualitative leap in recognition performance in as little as a few years. The breakthrough in technology allows more and more prototype intelligent vehicles to leave the laboratory, perform tests in real road environment, and gradually advance to practical application. The traffic sign is used for transmitting indication information to vehicles and pedestrians, and is a key factor for ensuring the fluency and the safety of traffic routes. Therefore, accurately and rapidly identifying the traffic sign is a key link for realizing the automatic driving safety guarantee. For the task of identifying the traffic signs, the traffic signs are required to be deployed in edge devices and mobile devices more, which is a challenge to the computing power and the memory of the edge devices, so that a traffic sign detection algorithm with both accuracy and real-time performance becomes the first choice.

The development of the current target detection algorithm based on the convolutional neural network is roughly divided into two directions: a candidate region based Two-stage detector and a regression based One-stage detector. The Two-stage detector generally has the characteristics of low speed and high precision, and the One-stage detector generally has the characteristics of high speed and low precision. Although the two-stage target detection method R-CNN series based on the candidate area achieves very good detection results, the algorithm is far away from the real-time effect in the aspect of speed due to the fact that the traffic sign identification requires high precision and speed. Unlike the R-CNN algorithm, the one-stage algorithm represented by SSD has relatively few network model parameters, and performs well in real-time, but is somewhat unsatisfactory in accuracy.

Aiming at the situation, the invention provides a Yolov 5-based traffic sign detection recognition model training method and system, which can effectively improve the prior art and overcome the defects of the prior art.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a Yolov 5-based traffic sign detection recognition model training method and system, aiming at solving the problems in the prior art, and the specific scheme is as follows:

in a first aspect, the invention provides a Yolov 5-based traffic sign detection and recognition model training method, which includes:

step 1: acquiring a traffic sign data set, wherein the data set comprises a plurality of training images, and the images are marked with information of a bounding box and actual classifications; according to the data proportion among different mark classes, carrying out image enhancement processing on the data set, including expanding the data set by adopting a random translation, scaling and cutting method, carrying out image denoising and color histogram equalization enhancement contrast on the expanded data set, and dividing the data set into a training set, a test set and a verification set;

step 2: acquiring a network model based on YOLOv5, and enabling Mosaic data enhancement and forbidding Fliplr data enhancement on the input end of the network model;

and step 3: modifying a network model based on YOLOv5, recalculating an anchors anchor frame for the network model by using a k-means algorithm, wherein the network model basically reserves a Backbone part and a Neck part, adding a detection layer aiming at a small target and adding a CBAM (cubic boron am) convolution attention module in the Neck part, and correspondingly adding a detection head aiming at the small target in the Prediction part of the network model to generate a target training model;

and 4, step 4: loading the target model through the training set and setting parameter training in the form of SGD + Momentum;

and 5: returning the parameter optimization condition to the step 4 through the test indexes until the model parameter performance is optimized, and storing the generated traffic sign image model;

step 6: and identifying the information of the mark to be detected facing the actual scene based on the generated model, namely outputting the detection position and the identification classification of the target mark.

Preferably, the step 1 comprises:

s21, expanding the acquired TT100K traffic sign data set by any one of methods of random translation, scaling and cutting, and reducing the data proportion among different sign classes;

s22, converting the format of the new data set into an xml format of a Pascal Voc mark, and then converting the format of the Pascal Voc mark into a txt format of a yolo mark, wherein the xml format is converted into the txt format as follows:

wherein x is the abscissa of the target center point/the total width of the picture, is the ordinate of the target center point/the total height of the picture, is the width of the target frame/the total width of the picture, and is the height of the target frame/the total height of the picture;

and S23, carrying out image enhancement on the new data set picture by adopting denoising and color histogram equalization so as to obtain the new traffic sign data set and the txt label corresponding to each sign picture.

Preferably, the method further comprises:

modifying the Neck part and the Prediction part in the detection network model to obtain a modified network model;

calculating an anchors frame under a new data set in the detection network model, and improving the attribute characteristics of the target to be detected based on the modified anchors frame and the mark facing to the small target;

a detection layer and a CBAM (cone beam-based amplitude modulation) convolution attention module aiming at the small target are added in the detection network model, and a target detection layer and a convolution attention module are added, so that the sensing range and the attention of the model to the small target are improved;

and adding a detection head aiming at the small target in the detection network model, adding the target detection head, and extracting the position and the information class of the target to be detected.

Preferably, when the network model is trained for the first time, the parameter training adopts an SGD + Momentum optimization algorithm, the learning rate adopts a cosine annealing strategy to dynamically adjust, and Momentum parameters, turns, batches, weight attenuation coefficients, an initial learning rate, weights and result storage weights are set.

Preferably, the method comprises:

the detection indexes are mAp indexes and Recall indexes, parameters are adjusted and optimized according to the excellent detection index results, the target image to be detected is detected and identified, and the traffic sign detection and identification results of the target image to be detected are obtained;

wherein the recognition model comprises the following units:

the system comprises an image acquisition unit, a data acquisition unit and a data processing unit, wherein the image acquisition unit is used for acquiring a traffic sign image, carrying out image enhancement and data set expansion on the traffic sign image to obtain a new image and a corresponding label of the traffic sign image data set, and dividing the new traffic sign image into a training set, a test set and a verification set;

the model loading unit is used for acquiring a network model based on YOLOv5, modifying the network model, adding a detection layer and a convolution attention module aiming at a small target in a Neck part, and adding a detection head aiming at the small target in a Prediction part to generate a new target training model;

the model setting unit is used for carrying out parameter setting and optimized adjustment according to the modified network model;

the model training unit is used for training the target training model through the training set and testing the trained target training model through the test index;

the model generation unit is used for storing the network model and the weight generated by training to generate a traffic sign image model after the network model is tested by the test set;

and the model identification unit is used for detecting and identifying the traffic sign of the target image to be detected.

In a second aspect, the present invention provides a Yolov5 traffic sign detection recognition model training system, which includes:

the acquisition module is used for acquiring a traffic sign data set, wherein the data set comprises a plurality of training images, and the images are marked with information of a boundary frame and actual classification; according to the data proportion among different mark classes, carrying out image enhancement processing on the data set, including expanding the data set by adopting a random translation, scaling and cutting method, carrying out image denoising and color histogram equalization enhancement contrast on the expanded data set, and dividing the data set into a training set, a test set and a verification set;

the enhancement module is used for acquiring a network model based on YOLOv5, enabling Mosaic data enhancement to be started at the input end of the network model and forbidding Fliplr data enhancement;

the tuning module is used for modifying a network model based on YOLOv5, recalculating an anchors anchor frame for the network model by using a k-means algorithm, basically reserving a Backbone part and a Neck part for the network model, adding a detection layer aiming at a small target and adding a CBAM (cubic boron am) convolution attention module in the Neck part, and correspondingly adding a detection head aiming at the small target in the Prediction part of the network model to generate a target training model;

the configuration module is used for loading the target model through the training set and setting parameter training in the form of SGD + Momentum;

the iteration module is used for returning the parameter optimization condition through the test indexes to the configuration module to reload and train the parameters of the target model and then reconfigure the parameters until the performance of the model parameters is optimized, and then storing the generated traffic sign image model;

and the output module is used for identifying the information of the mark to be detected facing the actual scene based on the generated model, namely outputting the detection position and the identification classification of the target mark.

In a third aspect, the present invention provides a Yolov 5-based traffic sign detection and recognition model training device, where the device includes:

the communication bus is used for realizing the connection communication between the processor and the memory;

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of:

acquiring a traffic sign data set, wherein the data set comprises a plurality of training images, and the images are marked with information of a boundary frame and actual classification; according to the data proportion among different mark classes, carrying out image enhancement processing on the data set, including expanding the data set by adopting a random translation, scaling and cutting method, carrying out image denoising and color histogram equalization enhancement contrast on the expanded data set, and dividing the data set into a training set, a test set and a verification set;

acquiring a network model based on YOLOv5, and enabling Mosaic data enhancement and forbidding Fliplr data enhancement on the input end of the network model;

modifying a network model based on YOLOv5, recalculating an anchors anchor frame for the network model by using a k-means algorithm, basically reserving a Backbone part and a Neck part for the network model, adding a detection layer aiming at a small target and adding a CBAM (cubic boron am) convolution attention module in the Neck part, and correspondingly adding a detection head aiming at the small target in the Prediction part of the network model to generate a target training model;

loading the target model through the training set and setting parameter training in the form of SGD + Momentum;

returning to reload and parameter training reconfiguration of the target model by performing parameter optimization conditions through the test indexes until the model parameter performance optimization is completed, and storing the generated traffic sign image model;

and identifying the information of the mark to be detected facing the actual scene based on the generated model, namely outputting the detection position and the identification classification of the target mark.

Has the advantages that: according to the traffic sign detection and recognition model training method and system based on Yolov5, the traffic sign detection and recognition model training is carried out through the network model based on Yolov5, and the accuracy rate of the model on the traffic sign detection is improved on the basis of ensuring the target detection speed and not greatly increasing the model parameters.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and the embodiments in the drawings do not constitute any limitation to the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of an embodiment of a Yolov 5-based traffic sign detection recognition model training method.

FIG. 2 is a schematic structural diagram of an embodiment of a traffic sign detection recognition model training system based on Yolov5 according to the present invention.

FIG. 3 is a schematic structural diagram of an embodiment of a traffic sign detection recognition model training device based on Yolov5 according to the present invention.

Detailed Description

The technical solution of the present invention is further described in detail with reference to the accompanying drawings and embodiments, which are preferred embodiments of the present invention. It is to be understood that the described embodiments are merely a subset of the embodiments of the invention, and not all embodiments; it should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The main idea of the technical scheme of the embodiment of the invention is as follows: acquiring a traffic sign data set, wherein the data set comprises a plurality of training images, and the images are marked with information of a boundary frame and actual classification; according to the data proportion among different mark classes, carrying out image enhancement processing on the data set, including expanding the data set by adopting a random translation, scaling and cutting method, carrying out image denoising and color histogram equalization enhancement contrast on the expanded data set, and dividing the data set into a training set, a test set and a verification set; acquiring a network model based on YOLOv5, and enabling Mosaic data enhancement and forbidding Fliplr data enhancement on the input end of the network model; modifying a network model based on YOLOv5, recalculating an anchors anchor frame for the network model by using a k-means algorithm, wherein the network model basically reserves a Backbone part and a Neck part, adding a detection layer aiming at a small target and a CBAM (cubic boron am) convolution attention module in the Neck part, and generating a target training model by correspondingly adding a detection head aiming at the small target in a Prediction part of the network model; loading the target model through the training set and setting parameter training in the form of SGD + Momentum; returning the parameter optimization condition to the step 4 through the test indexes until the model parameter performance is optimized, and storing the generated traffic sign image model; and identifying the information of the mark to be detected facing the actual scene based on the generated model, namely outputting the detection position and the identification classification of the target mark.

In order to better understand the technical solutions, the technical solutions will be described in detail below with reference to the drawings and specific embodiments.

Example one

An embodiment of the present invention provides a Yolov 5-based traffic sign detection recognition model training method, and as shown in fig. 1, the data processing method may specifically include the following steps:

s101, acquiring a traffic sign data set, wherein the data set comprises a plurality of training images, and the images are marked with information of a boundary frame and actual classification; according to the data proportion among different mark classes, carrying out image enhancement processing on the data set, including expanding the data set by adopting a random translation, scaling and cutting method, carrying out image denoising and color histogram equalization enhancement contrast on the expanded data set, and dividing the data set into a training set, a test set and a verification set;

specifically, the step S101 may include the steps of:

S102, acquiring a network model based on YOLOv5, and enabling Mosaic data enhancement and forbidding Fliplr data enhancement on the input end of the network model;

s103, modifying a network model based on YOLOv5, recalculating an anchors frame for the network model by using a k-means algorithm, wherein the network model basically reserves a backhaul part and a Neck part, adding a detection layer aiming at a small target and a CBAM (cubic boron am) convolution attention module in the Neck part, and correspondingly adding a detection head aiming at the small target in a Prediction part of the network model to generate a target training model;

s104, loading the target model through the training set and setting parameter training in the form of SGD + Momentum;

s105, performing parameter optimization conditions through the test indexes, returning to the step 4 until the performance of the model parameters is optimized, and storing the generated traffic sign image model;

and S106, identifying the information of the mark to be detected based on the generated model surface to the actual scene, namely outputting the detection position and identification classification of the target mark.

In an optional embodiment, the method may further comprise: firstly, modifying a Neck part and a Prediction part in a detection network model to obtain a modified network model; secondly, computing an anchors frame under a new data set in the detection network model, and improving the attribute characteristics of the target to be detected based on the modified anchors frame and the small target-oriented mark; then adding a detection layer and a CBAM (cone beam-based adaptive multi-path) convolution attention module aiming at the small target, and an added target detection layer and a convolution attention module in the detection network model, so as to improve the sensing range and attention of the model to the small target; and finally, adding a detection head aiming at the small target in the detection network model, adding the target detection head, and extracting the position and the information class of the target to be detected.

In the embodiment of the invention, when the network model is trained for the first time, the parameter training can adopt an SGD + Momentum optimization algorithm, the learning rate is dynamically adjusted by adopting a cosine annealing strategy, and Momentum parameters, turns, batches, weight attenuation coefficients, an initial learning rate, weights and result storage weights are set.

In an embodiment of the present invention, the method may include: the detection indexes are mAp indexes and Recall indexes, parameters are adjusted and optimized according to the excellent detection index results, the target image to be detected is detected and identified, and traffic sign detection and identification results of the target image to be detected are obtained;

the detection indexes are mAp index and Recall index, and the steps are as follows:

step M1: calculating the accuracy P of the single category, wherein the formula is as follows:

wherein p represents the accuracy of a single class, TP represents the number of positive samples that the model predicts as positive for the single class, and FP represents the number of negative samples that the model predicts as positive for the single class;

step M2: the recall rate for a single category is calculated by the formula:

wherein Recall represents Recall and FN represents the number of negative samples that a single category is predicted to be negative by the model;

step M3: the average accuracy Ap for each class is calculated, which is formulated as follows:

wherein Ap represents the average precision of each category, the recall rate is taken as an abscissa, the accuracy is taken as an ordinate, and an accuracy-recall rate (P-R) curve graph is drawn, and the area enclosed by the curves is called the size of the Average Precision (AP);

step M4: based on the average precision obtained for each class, an average precision mAp is calculated for all classes, which is given by the following equation:

wherein mAp represents the average precision of all classes, and classes represents the number of classes.

The identification model specifically comprises the following units:

Example two

An embodiment of the present invention provides a Yolov 5-based traffic sign detection and recognition model training system, and as shown in fig. 2, the information processing system may specifically include the following modules:

the acquisition module is used for acquiring a traffic sign data set, wherein the data set comprises a plurality of training images, and the images are marked with information of a boundary frame and actual classification; according to the data proportion among different mark classes, carrying out image enhancement processing on the data set, wherein the image enhancement processing comprises the steps of expanding the data set by adopting a random translation, scaling and cutting method, carrying out image denoising and color histogram equalization enhancement contrast on the expanded data set, and dividing the data set into a training set, a test set and a verification set;

specifically, the obtaining module may include: the acquired TT100K traffic sign data set is expanded by any one of the methods of random translation, scaling and cutting, and the data proportion among different sign classes is reduced; converting the format of the new data set into an xml format of a Pascal Voc mark, and then converting the format of the Pascal Voc mark into a txt format of a yolo mark, wherein the conversion of the xml format into the txt format is as follows:

wherein x is the abscissa of the target center point/the total width of the picture, is the ordinate of the target center point/the total height of the picture, is the width of the target frame/the total width of the picture, and is the height of the target frame/the total height of the picture; and carrying out image enhancement on the new data set picture by adopting denoising and color histogram equalization so as to obtain the new traffic sign data set and the txt label corresponding to each sign picture.

In an optional embodiment, the heck part and the Prediction part may be modified in the detection network model to obtain a modified network model; secondly, computing an anchors frame under a new data set in the detection network model, and improving the attribute characteristics of the target to be detected based on the modified anchors frame and the small target-oriented mark; then adding a detection layer and a CBAM (cone beam-based adaptive multi-path) convolution attention module aiming at the small target, and an added target detection layer and a convolution attention module in the detection network model, so as to improve the sensing range and attention of the model to the small target; and finally, adding a detection head aiming at the small target in the detection network model, adding the target detection head, and extracting the position and the information class of the target to be detected.

In the embodiment of the invention, the detection indexes are mAp index and Recall index, the parameters are optimized according to the excellent detection index result, and the target image to be detected is detected and identified to obtain the traffic sign detection and identification result of the target image to be detected;

the detection indexes are mAp indexes and Recall indexes, parameters are adjusted and optimized according to the excellent detection index results, and the target image to be detected is detected and identified to obtain the traffic sign detection and identification results of the target image to be detected;

step M2: the recall rate for a single category is calculated by the formula:

step M4: based on the average precision obtained for each class, an average precision mAp for all classes is calculated, which is given by the following equation:

where mAp represents the average precision of all classes, and classes represents the number of classes.

The identification model specifically comprises the following units:

the system comprises an image acquisition unit, a data acquisition unit and a data processing unit, wherein the image acquisition unit is used for acquiring a traffic sign image, performing image enhancement and data set expansion on the traffic sign image to obtain a new image and a corresponding label of the traffic sign image data set, and dividing the new traffic sign image into a training set, a test set and a verification set;

EXAMPLE III

An embodiment of the present invention provides a Yolov 5-based traffic sign detection and recognition model training system, and as shown in fig. 3, the information processing system may specifically include the following modules:

a memory for storing a computer program; the memory may comprise high-speed RAM memory and may also comprise non-volatile memory, such as at least one disk memory. The memory may optionally comprise at least one memory device.

A processor for executing the computer program to implement the steps of:

specifically, the step S101 may include the steps of:

expanding the acquired TT100K traffic sign data set by any one of the methods of random translation, scaling and cutting, and reducing the data proportion among different sign classes;

converting the format of the new data set into an xml format of a Pascal Voc marker, and then converting the format of the Pascal Voc marker into a txt format of a yolo marker, wherein the conversion of the xml format into the txt format is as follows:

wherein x is the abscissa of the target center point/the total width of the picture, the ordinate of the target center point/the total height of the picture, the width of the target frame/the total width of the picture, and the height of the target frame/the total height of the picture;

and carrying out image enhancement on the new data set picture by adopting denoising and color histogram equalization so as to obtain the new traffic sign data set and the txt label corresponding to each sign picture.

returning the parameter optimization condition to the step 4 through the test indexes until the model parameter performance is optimized, and storing the generated traffic sign image model;

In an optional embodiment, the method may further comprise: firstly, modifying a Neck part and a Prediction part in a detection network model to obtain a modified network model; secondly, computing an anchors frame under a new data set in the detection network model, and improving the attribute characteristics of the target to be detected based on the modified anchors frame and the small target-oriented mark; then adding a detection layer and a CBAM convolution attention module aiming at the small target and an added target detection layer and convolution attention module in the detection network model to improve the sensing range and attention of the model to the small target; and finally, adding a detection head aiming at the small target in the detection network model, adding the target detection head, and extracting the position and the information class of the target to be detected.

the steps of detecting the indexes of mAp and Recall indexes are as follows:

step M1: the accuracy P of a single class is calculated, which is formulated as follows:

step M2: the recall rate for a single category is calculated by the formula:

wherein Ap represents the average precision of each category, the recall rate is taken as an abscissa, the accuracy is taken as an ordinate, an accuracy-recall rate (P-R) curve graph is drawn, and the area enclosed by the curves is called the size of the Average Precision (AP);

The identification model specifically comprises the following units:

The processor in this embodiment may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in a processor. The processor described above may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. The processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor. The software modules may be located in ram, flash memory, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the method.

Example four

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the training method described above.

In summary, according to the traffic sign detection recognition model training method and system based on Yolov5 provided by the embodiments of the present invention, the traffic sign detection recognition model training is performed through the network model based on Yolov5, so that the accuracy of the model for detecting the traffic sign is improved on the basis of ensuring the target detection speed and not greatly increasing the model parameters.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of operations, but those skilled in the art will recognize that the present invention is not limited by the described operation sequence, because some steps may be performed in other sequences or simultaneously according to the present invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules illustrated are not necessarily required to practice the invention.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded or executed on a computer, the computer program instructions produce, in whole or in part, the processes or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, those skilled in the art will appreciate that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the spirit of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for training a traffic sign detection recognition model based on Yolov5 is characterized by comprising the following steps:

step 1: acquiring a traffic sign data set, wherein the data set comprises a plurality of training images, and the images are marked with information of a boundary frame and actual classification; according to the data proportion among different mark classes, carrying out image enhancement processing on the data set, including expanding the data set by adopting a random translation, scaling and cutting method, carrying out image denoising and color histogram equalization enhancement contrast on the expanded data set, and dividing the data set into a training set, a test set and a verification set;

and 3, step 3: modifying a network model based on YOLOv5, recalculating an anchors anchor frame for the network model by using a k-means algorithm, wherein the network model basically reserves a Backbone part and a Neck part, adding a detection layer aiming at a small target and adding a CBAM (cubic boron am) convolution attention module in the Neck part, and correspondingly adding a detection head aiming at the small target in the Prediction part of the network model to generate a target training model;

and 5: returning the parameter optimization condition to the step 4 through the test indexes until the performance optimization of the model parameters is completed, and storing the generated traffic sign image model;

and 6: and identifying the information of the mark to be detected facing the actual scene based on the generated model, namely outputting the detection position and the identification classification of the target mark.

2. The method of claim 1, wherein step 1 comprises:

s21, expanding the acquired TT100K traffic sign data set by any one of the methods of random translation, scaling and cutting, and reducing the data proportion among different sign classes;

and S23, carrying out image enhancement on the new data set picture by adopting denoising and color histogram equalization so as to obtain the new traffic sign data set and txt labels corresponding to each sign picture.

3. The method of claim 1, further comprising:

calculating an anchors frame under a new data set in the detection network model, and improving the attribute characteristics of the target to be detected based on the modified anchors frame and the small target-oriented mark;

a detection layer and a CBAM (cone beam-based adaptive multi-path) convolution attention module aiming at the small target and an added target detection layer and convolution attention module are added in the detection network model, so that the sensing range and attention of the model to the small target are improved;

4. The method according to claim 1, wherein in the initial training of the network model, the parameter training adopts an SGD + Momentum optimization algorithm, the learning rate adopts a cosine annealing strategy to dynamically adjust, and Momentum parameters, turns, batches, weight attenuation coefficients, an initial learning rate, weights and result storage weights are set.

5. The method according to claim 1, characterized in that it comprises:

the steps of detecting the indexes of mAp and Recall indexes are as follows:

step M2: the recall rate for a single category is calculated by the formula:

wherein Recall represents Recall, FN represents the number of negative samples a single category was predicted to be negative by the model;

wherein mAp represents the average precision of all classes, and classes represents the number of classes;

the recognition model comprises the following units:

the model training unit is used for training the target training model through the training set and testing the trained target training model through the test indexes;

6. A Yolov5 traffic sign detection recognition model-based training system, the system comprising:

the acquisition module is used for acquiring a traffic sign data set, wherein the data set comprises a plurality of training images, and the images are marked with information of a boundary frame and actual classification; according to the data proportion among different mark classes, carrying out image enhancement processing on the data set, including expanding the data set by adopting methods of random translation, scaling and cutting, carrying out image denoising and color histogram equalization enhancement contrast on the expanded data set, and dividing the data set into a training set, a test set and a verification set;

the enhancement module is used for acquiring a network model based on YOLOv5, and enabling Mosaic data enhancement and forbidding Fliplr data enhancement on the input end of the network model;

the iteration module is used for returning the parameter optimization condition through the test index to the configuration module to reload and train the parameters of the target model and then reconfigure the target model until the performance of the model parameters is optimized, and then storing the generated traffic sign image model;

and the output module is used for identifying the information of the mark to be detected based on the generated model facing the actual scene, namely outputting the detection position and the identification classification of the target mark.

7. A Yolov5 traffic sign detection recognition model-based training device, the device comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of:

modifying a network model based on YOLOv5, recalculating an anchors anchor frame for the network model by using a k-means algorithm, wherein the network model basically reserves a Backbone part and a Neck part, adding a detection layer aiming at a small target and adding a CBAM (cubic boron am) convolution attention module in the Neck part, and correspondingly adding a detection head aiming at the small target in the Prediction part of the network model to generate a target training model;

performing parameter optimization conditions through the test indexes, returning to perform reloading and parameter training reconfiguration on the target model until the performance of the model parameters is optimized, and storing the generated traffic sign image model;

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.