CN117197781A - Traffic sign recognition method and device, storage medium and electronic equipment - Google Patents

Traffic sign recognition method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117197781A
CN117197781A CN202311457493.7A CN202311457493A CN117197781A CN 117197781 A CN117197781 A CN 117197781A CN 202311457493 A CN202311457493 A CN 202311457493A CN 117197781 A CN117197781 A CN 117197781A
Authority
CN
China
Prior art keywords
image
sample image
traffic sign
recognition model
area position
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311457493.7A
Other languages
Chinese (zh)
Other versions
CN117197781B (en
Inventor
李可欣
万志国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311457493.7A priority Critical patent/CN117197781B/en
Publication of CN117197781A publication Critical patent/CN117197781A/en
Application granted granted Critical
Publication of CN117197781B publication Critical patent/CN117197781B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The specification discloses a traffic sign recognition method, a traffic sign recognition device, a storage medium and electronic equipment. The traffic sign recognition method comprises the following steps: acquiring each sample image, inputting the sample image into a pre-constructed recognition model aiming at each sample image, embedding each feature contained in the recognition model into a network layer, and expanding the basic feature corresponding to the determined sample image by using different expansion coefficients to obtain each expansion image feature; inputting the characteristics of each expansion image into an identification network layer in the identification model to obtain an identification result corresponding to the sample image; and training the recognition model according to the recognition result and the label result corresponding to the sample image to obtain a trained recognition model, so as to recognize traffic signs related to the acquired traffic image through the trained recognition network.

Description

Traffic sign recognition method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of autopilot, and in particular, to a method and apparatus for identifying traffic signs, a storage medium, and an electronic device.
Background
With the development and perfection of artificial intelligence technology, automatic driving technology has also been rapidly developed. The automatic driving technology can bring great convenience to the travel of people, and gradually becomes one of the hot directions of future development. In an environment sensing system of an automatic driving technology, detection and identification of traffic signs around a road are a very important ring, and whether the traffic signs around the road relate to whether a vehicle can perform correct driving operation or not can be accurately identified.
Currently, existing traffic sign detection and identification methods can be roughly divided into two types, a conventional method and a method based on deep learning. The traditional traffic sign detection and identification method requires manual intervention to extract characteristics, and the speed and the precision can not meet the requirements of automatic driving technology; the traffic sign detection and recognition method based on deep learning can automatically extract features from images or videos, has high speed and high precision, and therefore becomes a mainstream gradually.
However, the detection and recognition results of the current mainstream deep learning-based traffic sign detection and recognition algorithm on the smaller-scale traffic sign are often unsatisfactory, because the smaller-scale traffic sign carries fewer pixels, and the problems of inaccurate detection results, lost features, high omission rate and the like are easy to occur.
Therefore, how to detect and identify the traffic sign with smaller scale, so that the automatic driving technology is safer and more reliable, is a problem to be solved urgently.
Disclosure of Invention
The specification provides a traffic sign recognition method, a traffic sign recognition device, a storage medium and electronic equipment. To partially solve the above-mentioned problems of the prior art.
The technical scheme adopted in the specification is as follows:
the specification provides a traffic sign recognition method, comprising the following steps:
acquiring each sample image, wherein each sample image comprises an image of a traffic sign;
inputting each sample image into a pre-constructed recognition model, so as to obtain each expansion image characteristic by expanding the basic characteristic corresponding to the determined sample image through each characteristic embedded network layer contained in the recognition model by using different expansion coefficients, wherein each characteristic embedded network layer is used for performing characteristic embedding on the basic characteristic by using different expansion coefficients;
inputting the characteristics of each expansion image into an identification network layer in the identification model to obtain an identification result corresponding to the sample image;
And training the recognition model according to the recognition result and the label result corresponding to the sample image to obtain a trained recognition model, so as to recognize traffic signs related to the acquired traffic image through the trained recognition network.
Optionally, before inputting the sample image into the pre-constructed recognition model, the method further comprises, for each sample image:
clustering the images of the traffic signs contained in each sample image according to the size of the scale to obtain each cluster;
for each cluster, determining an image frame corresponding to the cluster according to the contour information of each image of the traffic sign contained in the cluster, wherein each image of the traffic sign contained in the cluster corresponds to the image frame;
determining basic characteristics of the sample image specifically comprises the following steps:
inputting the sample image and frame information of an image frame corresponding to the sample image into a pre-constructed recognition model for each sample image, obtaining initial characteristics of the sample image through a trunk characteristic extraction layer in the recognition model, and extracting image characteristics of the image positioned in the image frame from the initial characteristics according to the frame information to obtain basic characteristics of the sample image.
Optionally, for each sample image, inputting the sample image into a pre-constructed recognition model, so as to embed each feature contained in the recognition model into a network layer, and using different expansion coefficients to expand the determined basic feature corresponding to the sample image to obtain each expansion image feature, which specifically includes:
inputting each sample image into a pre-constructed recognition model for determining the corresponding basic characteristics of the sample image through the recognition model;
inputting the basic features into the feature embedded network layer aiming at each feature embedded network layer so as to divide the basic features through feature division strategies corresponding to the feature embedded network layer to obtain sub-basic features;
embedding each sub-basic feature into the corresponding expansion coefficient of the network layer according to the feature to expand so as to obtain each expanded sub-basic feature;
and aggregating the basic characteristics of each expansion post-sub, and obtaining the characteristics of the expansion image output by the embedded network layer.
Optionally, training the identification model according to the identification result and the label result corresponding to the sample image specifically includes:
Determining the image area position of the traffic sign image contained in the sample image identified by the identification model and the traffic sign category of the identified traffic sign according to the identification result;
and training the recognition model by taking the deviation between the minimum image area position and the real area position of the image of the traffic sign contained in the sample image and the deviation between the minimum traffic sign category and the real category corresponding to the traffic sign as optimization targets.
Optionally, training the recognition model with an optimization objective that minimizes a deviation between the image area position and a real area position where an image of the traffic sign contained in the sample image is located, and minimizes a deviation between the traffic sign category and a real category corresponding to the traffic sign, specifically including:
training the recognition model by taking the minimum deviation between the image area position and the real area position of the image of the traffic sign contained in the sample image, the minimum deviation between the traffic sign category and the real category corresponding to the traffic sign and the minimum distance between the center point of the image area position and the center point of the real area position as optimization targets.
Optionally, training the recognition model with an optimization objective that minimizes a deviation between the image area position and a real area position where an image of the traffic sign contained in the sample image is located, and minimizes a deviation between the traffic sign category and a real category corresponding to the traffic sign, specifically including:
training the recognition model with the aim of minimizing the deviation between the image area position and the real area position where the image of the traffic sign contained in the sample image is located, minimizing the deviation between the traffic sign category and the real category corresponding to the traffic sign, and minimizing the deviation between the width and the height of the image area position and the width and the height of the real area position.
Optionally, training the recognition model with an optimization objective that minimizes a deviation between the image area position and a real area position where an image of the traffic sign contained in the sample image is located, and minimizes a deviation between the traffic sign category and a real category corresponding to the traffic sign, specifically including:
determining a loss value corresponding to the sample image and a weight corresponding to the sample image according to the deviation between the image area position and the real area position of the image of the traffic sign contained in the sample image and the deviation between the traffic sign category and the real category corresponding to the traffic sign, wherein if the deviation between the traffic sign category and the real category corresponding to the traffic sign is larger, the weight corresponding to the sample image is larger;
And carrying out weighted summation on the loss value corresponding to each sample image according to the weight corresponding to each sample image to obtain a total loss value, and training the recognition model by taking the minimum total loss value as an optimization target.
The present specification provides a traffic sign recognition apparatus, comprising:
the acquisition module is used for acquiring each sample image;
the processing module is used for inputting each sample image into a pre-constructed identification model so as to enable the determined basic features corresponding to the sample image to be expanded by using different expansion coefficients through each feature embedded network layer contained in the identification model, and obtaining each expansion image feature, wherein each feature embedded network layer is used for performing feature embedding on the basic features by using different expansion coefficients;
the generation module is used for inputting the characteristics of each expansion image into an identification network layer in the identification model to obtain an identification result corresponding to the sample image;
and the execution module is used for training the recognition model according to the recognition result and the label result corresponding to the sample image to obtain a trained recognition model so as to recognize traffic signs related to the acquired traffic image through the trained recognition network.
The present description provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of traffic sign recognition described above.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of traffic sign identification described above when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
in the traffic sign recognition method provided by the specification, firstly, each sample image is acquired, the sample image is input into a pre-built recognition model, each feature contained in the recognition model is embedded into a network layer, different expansion coefficients are used for expanding the basic features corresponding to the determined sample image in multiple scales to obtain each expansion image feature, then, each expansion image feature is input into the recognition network layer in the recognition model to obtain a recognition result corresponding to the sample image, finally, the recognition model is trained according to the recognition result and a label result corresponding to the sample image, the deviation between the recognition result and a real result is reduced as much as possible, the recognition model is obtained after training, and the recognition network is used for recognizing traffic signs related to the acquired traffic images after training.
According to the method, the basic features corresponding to the acquired sample images can be expanded in multiple scales according to the pre-constructed identification model, so that the features of the small traffic sign can be better extracted, and the accuracy of traffic sign detection is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a flow chart of a method of traffic sign recognition provided in the present specification;
FIG. 2 is a schematic structural view of a feature embedding layer provided in the present specification;
FIG. 3 is a schematic diagram of a traffic sign recognition device provided in the present specification;
fig. 4 is a schematic structural view of an electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a flow chart of a method for identifying traffic signs provided in the present specification, which includes the following steps:
s101: each sample image is acquired, and each sample image contains an image of a traffic sign.
With the increasing trend of automobiles becoming one of the indispensable tools for people to travel daily, the automatic driving technology also becomes a popular research direction at home and abroad, and for the automatic driving technology, the detection and identification of the surrounding environment of the road, especially the detection and identification of the road traffic sign are very important. However, the currently mainstream detection and recognition method of the traffic sign based on deep learning is not satisfactory for the recognition result of the smaller traffic sign, because the smaller traffic sign carries fewer pixels and carries less information for detection.
Based on the above, the specification provides a traffic sign recognition method, which can better extract the related characteristic information of smaller traffic signs by expanding the related characteristics of the traffic signs by a plurality of scales, thereby improving the accuracy of traffic sign recognition.
In the present specification, the execution subject of the method for realizing traffic sign recognition may be a designated device such as a server, a terminal device such as a desktop computer or a notebook computer, or a client installed in the terminal device, and for convenience of description, the present specification will describe a method for recognizing traffic sign provided in the present specification by taking only the server as an execution subject.
In this specification, the server may acquire each sample image in various manners, for example, the server may acquire each sample image in a manner of a mobile storage device, network transmission, or the like, and store each sample image in a designated storage space to further process each sample image in a subsequent process. Each sample image is an image including a traffic sign and an environment in which the traffic sign is located, and may be obtained by shooting or from a third party data set.
S102: inputting each sample image into a pre-constructed recognition model, so as to obtain each expansion image characteristic by expanding the basic characteristic corresponding to the determined sample image through each characteristic embedded network layer contained in the recognition model by using different expansion coefficients, wherein each characteristic embedded network layer is used for carrying out characteristic embedding on the basic characteristic by using different expansion coefficients.
Before each sample image is input into a pre-constructed recognition model, the server needs to cluster according to the scale of the traffic sign images contained in each sample image, and the sample images with similar scale of the contained traffic sign images are clustered in the same cluster. Then, for each cluster, the contour of the traffic sign image contained in all the sample images contained in the cluster is taken as a constraint, and an image frame corresponding to the cluster is generated. The image frames corresponding to each cluster are the image frames corresponding to all sample images in the cluster.
In the specification, the pre-constructed recognition model consists of three parts, namely a main feature extraction layer, a feature embedding network layer and a recognition network layer. The main feature extraction layer is used for extracting features of all features contained in the sample image to obtain initial features, and further extracting the initial features according to frame information of an image frame corresponding to the sample image to obtain basic features; the characteristic embedding network layer is used for expanding the basic characteristic corresponding to the determined sample image by using different expansion coefficients to obtain the characteristic of each expanded image; the identification network layer is used for identifying the sample image according to the characteristics of each expansion image, outputting an identification result, and training an identification model according to the deviation between the identification result and the real result.
For each sample image, the server inputs the sample image and frame information of an image frame corresponding to the sample image into a pre-constructed recognition model, the sample image firstly enters a main feature extraction layer to perform initial feature extraction to obtain initial features of the sample image, and then the initial features are further extracted according to the frame information of the image frame corresponding to the sample image to obtain basic features. The initial feature contains feature information related to the traffic sign and feature information unrelated to the traffic sign, and the basic feature contains only feature information related to the traffic sign.
The server then inputs the base features into the "feature embedding network layer" for feature embedding. The characteristic embedding network layer uses different expansion coefficients to expand the determined basic characteristic corresponding to the sample image, so as to obtain each expansion image characteristic.
In the present specification, each feature embedding network layer uses different expansion coefficients to expand the determined basic features corresponding to the sample image from multiple scales, so as to obtain each expansion image feature. For any one of the feature embedding network layers, the obtained basic features are actually divided according to a certain division strategy to obtain each sub-basic feature, then each sub-basic feature is expanded according to expansion parameters corresponding to the feature embedding network layer to obtain each expanded sub-basic feature, and then each expanded sub-basic feature is aggregated to obtain the expanded image feature output by the feature embedding network layer.
The present description will explain the structure of the feature embedded network layer by taking the workflow of the feature embedded network layer when the expansion coefficient r=1 as an example, as shown in fig. 2.
Fig. 2 is a block diagram of a feature embedded network layer provided in the present specification.
Assuming that the basic feature dimension of the input is 20×20×1024, 512 convolution cores with the size of 1*1 are first used to perform convolution operation, so as to generate 20×20×512 feature blocks, so as to shorten the length of the basic feature of the input and facilitate the subsequent expansion convolution operation.
The core area of the feature embedded network layer is constructed by combining expansion convolution and division convolution, the division convolution is to divide the basic features to obtain the basic features of each sub-group, so that the basic features are divided into a plurality of parts, the parameter quantity and the operation quantity can be effectively reduced in the subsequent operation, and the network is lighter. In this example, the division number g of the division convolution is taken as 4, the generated feature blocks of 20×20×512 are divided into four parts according to the length, the width and the height of each part are kept unchanged, and the length is shortened to 1/4 of the original length, so that the dimension of each sub-basic feature obtained by the division convolution is 20×20×128.
And performing expansion convolution on four sub-basic features with the dimension of 20 x 128 to obtain four expanded sub-basic features, and then splicing the four expanded sub-basic features to obtain an output of the division convolution with the dimension of 20 x 512. Then, the convolution operation is carried out on the output of the division convolution by using 1024 convolution kernels with the size of 1*1, and the feature blocks with the dimension of 20 x 1024 are generated, so that the length of the feature blocks is restored, and the current feature size is guaranteed to be the same as the input feature size, and is 20 x 1024. And then, carrying out standardization and normalization operation on the feature blocks of 20 x 1024, and carrying out convolution again by using 1024 convolution kernels with the size of 1*1 to ensure the discriminant of the features, so as to obtain the expanded image features of the features output by the embedded network layer, wherein the dimensions are 20 x 1024.
It should be noted that when the network uses several expansion convolutions with the same expansion coefficients continuously, all the data in the input feature are not utilized in the process of obtaining the output feature data, and a certain interval exists between every non-zero element, which is a grid effect, and because all the pixel values of the input feature are not utilized, a part of detail information is necessarily lost, which is unfavorable for the detection of a small target. Therefore, in the present specification, by designing the expansion coefficients of the feature embedded network layers in a manner conforming to the specified principle, the grid effect that is often caused by expansion convolution can be avoided.
Specifically, the specified principle that expansion convolution requires expansion coefficients to follow is: (1) the maximum distance between two non-zero elements in any layer of the continuous expansion convolution is smaller than or equal to the size of a convolution kernel of the layer, so that the two non-zero elements in any layer are required to be within the range of the convolution kernel, and the convolution operation can be facilitated to effectively capture local features; (2) all expansion rates of the continuous expansion convolution cannot have a common divisor of greater than 1 in order to prevent information overlap and maintain an effective receptive field; (3) the expansion rate of the continuous expansion convolution is designed into a saw-tooth structure, so as to increase the multiscale receptive field of the convolution operation, thereby better capturing the characteristics of input data.
S103: and inputting the characteristics of each expansion image into a recognition network layer in the recognition model to obtain a recognition result corresponding to the sample image.
The server inputs each expansion image feature output by each feature embedded network layer into a recognition network layer in a pre-constructed recognition model, the recognition network layer generates a recognition result corresponding to the sample image according to each input expansion image feature, wherein the recognition result comprises an image region position and a prediction category, and then deviation between the image region position and the prediction category and deviation between a real region position and the real category can be minimized as an optimization target, and the recognition model is trained.
S104: and training the recognition model according to the recognition result and the label result corresponding to the sample image to obtain a trained recognition model, so as to recognize traffic signs related to the acquired traffic image through the trained recognition network.
In the present specification, the role of the "recognition network layer" is to recognize the sample image according to each of the expansion image features outputted from the "feature embedding network layer", output a recognition result, and train the recognition model according to the deviation between the recognition result and the real result.
In order to make the recognition result of the recognition model more approximate to the real result, besides introducing two optimization targets of minimizing the deviation between the image region position and the real region position and minimizing the deviation between the prediction category and the real category, other optimization targets can be additionally introduced to train the recognition model.
In order to reduce the deviation between the image area position and the real area position as much as possible, the "recognition network layer" considers the overlapping rate of the image area position and the real area position, the center point distance and the difference of width and height, in the regression process, the punishment mechanism of the center point distance makes the distance between the image area position and the real area position more and more close, and the punishment mechanism of the difference of width and height makes the width and height of the image area position and the real area position more and more close. Meanwhile, in order to make the shapes of the image area position and the real area position more and more similar, the identification network layer also sets a punishment mechanism for the distances between four vertexes of the image area position and the real area position.
Therefore, in the present specification, the server can train the recognition model by minimizing the deviation between the image area position and the real area position where the image of the traffic sign contained in the sample image is located, minimizing the deviation between the traffic sign category and the real category to which the traffic sign corresponds, and minimizing the distance between the center point of the image area position and the center point of the real area position as the optimization target.
The recognition model can also be trained by minimizing the deviation between the image area position and the real area position where the image of the traffic sign contained in the sample image is located, minimizing the deviation between the traffic sign category and the real category corresponding to the traffic sign, and minimizing the deviation between the width and the height of the image area position and the width and the height of the real area position as optimization targets.
In both of the above ways, minimizing the deviation between the image area position and the true area position where the image of the traffic sign contained in the sample image is located actually includes maximizing the overlapping ratio between the image area position and the true area position where the image of the traffic sign contained in the sample image is located, and minimizing the deviation between the four vertices of the image area position and the four vertices of the true area position where the image of the traffic sign contained in the sample image is located.
Of course, in practical application, all the optimization targets mentioned above can be applied to the training process of the recognition model.
For example, the optimization of the image region location in the recognition result by the "recognition network layer" can be expressed as:
wherein the real area position is The width and height of the real area position are respectively +.>And->The image area position is +.>Width and height are +.>And->,/>、/>And->The minimum rectangle diagonal distance, width and height which can simultaneously contain the image area position and the real area position are respectively, diou is a punishment term for measuring the distance between four vertexes of the image area position and the real area position, the calculation mode is the sum of the distance between the four vertexes of the image area position and the corresponding four vertexes of the real area position, and the sum is divided by ∈>
It can be seen that the second term of the ECDIoU formula is a penalty for the distance between the center points of the image region position and the real region position, the third term and the fourth term are penalties for the difference in width and height between the image region position and the real region position, respectively, diou is a penalty for measuring the distance between four vertices of the image region position and the real region position, and the distance penalty for the four vertices not only accelerates the pull-in of the distance between the two region positions, but also makes the shapes of the prediction frame and the real frame more and more similar until they nearly overlap. Therefore, the ECDIoU bounding box regression loss function can promote the image region position to quickly and accurately regress to the real region position, so that the image region position in the identification result is optimized.
Further, in the training process of the recognition model, the required training samples are often numerous. For each training sample, the "recognition network layer" may analyze the recognition result of the sample image to determine a loss value corresponding to the sample image, and determine a weight corresponding to the sample image according to a deviation between a traffic sign category corresponding to the sample image and a real category, where if the deviation is larger, the weight of the sample image is also larger, so as to form a positive correlation.
After determining the loss value corresponding to each sample image and the weight corresponding to each sample image, determining the total loss value in a weighted summation mode, and further training the recognition model by taking the minimum total loss value as an optimization target.
According to the method, the basic features corresponding to the obtained sample image can be expanded in multiple scales according to the pre-constructed identification model, so that the features of the small traffic sign can be better extracted, the accuracy of traffic sign detection is improved, a new boundary box regression loss function is constructed according to the identification result and the label result corresponding to the sample image, the identification model is trained, the overlapping rate of the prediction box and the real box, the center point distance and the wide-height gap are fully considered, and the accuracy of traffic sign identification is improved.
Moreover, as can be seen from the above, a sample image which is difficult to identify in the model training process can be given a larger weight (the sample image which is difficult to identify is an image with larger deviation between the traffic sign category and the real category output by the identification model), so that the sample image which is difficult to identify can be focused more in the identification model training process, and the training effect of the model is further improved.
The trained recognition model can be applied to the automatic driving technology of the vehicle. The environmental information around the road on which the vehicle is traveling is collected by the collecting device in the vehicle, and the recognition model can recognize traffic signs according to the collected environmental information (usually video or picture) so as to help the automatic driving system to perform correct driving operation and plan a driving route.
In this specification, the recognition model may specifically take various forms, for example, the recognition model may be redesigned based on the YOLOv5 target detection framework. The method comprises the steps of reserving a main structure of a CSPDarknet of a Yolov5, cutting off at the tail, removing output of the first two effective feature layers, selecting only the deepest features of the main network as features extracted from a sample image containing a traffic sign image, and finally constructing a 'main feature extraction layer' of an identification model; the FPN feature pyramid fusion module of YOLOv5 is replaced by the feature embedding network layer provided by the specification so as to be connected with the backbone feature extraction layer for feature embedding; and reserving a head prediction network structure of YOLOv5, replacing a boundary box regression loss function in the head prediction network structure with an ECDIoU boundary box regression loss function provided in the specification, replacing a confidence coefficient loss function in the head prediction network structure with a focus loss function, and finally obtaining an identification network layer in an identification model.
The above is a method for implementing traffic sign recognition for one or more of the present specification, and based on the same thought, the present specification further provides a corresponding traffic sign recognition device, as shown in fig. 3.
Fig. 3 is a schematic diagram of a traffic sign recognition device provided in the present specification, including:
an acquisition module 301, configured to acquire each sample image;
the processing module 302 is configured to input, for each sample image, the sample image into a pre-constructed recognition model, so as to use different expansion coefficients to expand the basic features corresponding to the determined sample image through each feature embedded network layer included in the recognition model, so as to obtain each expanded image feature, where each feature embedded network layer is configured to use different expansion coefficients to perform feature embedding on the basic features;
the generating module 303 is configured to input the features of each expansion image into a recognition network layer in the recognition model, so as to obtain a recognition result corresponding to the sample image;
and the execution module 304 is configured to train the recognition model according to the recognition result and the label result corresponding to the sample image, so as to obtain a trained recognition model, and recognize the traffic sign related to the acquired traffic image through the trained recognition network.
Optionally, the processing module 302 is further configured to cluster the images of the traffic sign included in each sample image according to the size of the scale, so as to obtain each cluster; for each cluster, determining an image frame corresponding to the cluster according to the contour information of each image of the traffic sign contained in the cluster, wherein each image of the traffic sign contained in the cluster corresponds to the image frame;
determining basic characteristics of the sample image specifically comprises the following steps: inputting the sample image and frame information of an image frame corresponding to the sample image into a pre-constructed recognition model for each sample image, obtaining initial characteristics of the sample image through a trunk characteristic extraction layer in the recognition model, and extracting image characteristics of the image positioned in the image frame from the initial characteristics according to the frame information to obtain basic characteristics of the sample image.
Optionally, the processing module 302 is specifically configured to, for each sample image, input the sample image into a pre-constructed recognition model, so as to determine, according to the recognition model, a basic feature corresponding to the sample image; inputting the basic features into the feature embedded network layer aiming at each feature embedded network layer so as to divide the basic features through feature division strategies corresponding to the feature embedded network layer to obtain sub-basic features; embedding each sub-basic feature into the corresponding expansion coefficient of the network layer according to the feature to expand so as to obtain each expanded sub-basic feature; and aggregating the basic characteristics of each expansion post-sub, and obtaining the characteristics of the expansion image output by the embedded network layer.
Optionally, the executing module 304 is specifically configured to determine, according to the identification result, an image area location where an image of a traffic sign included in the sample image identified by the identification model is located, and a traffic sign category of the identified traffic sign; and training the recognition model by taking the deviation between the minimum image area position and the real area position of the image of the traffic sign contained in the sample image and the deviation between the minimum traffic sign category and the real category corresponding to the traffic sign as optimization targets.
Optionally, the executing module 304 is specifically configured to train the recognition model with a minimum deviation between the image area location and a real area location where an image of a traffic sign included in the sample image is located, a minimum deviation between the traffic sign category and a real category corresponding to the traffic sign, and a minimum distance between a center point of the image area location and a center point of the real area location as optimization targets.
Optionally, the executing module 304 is specifically configured to train the recognition model with a minimum deviation between the image area location and a real area location where an image of a traffic sign included in the sample image is located, a minimum deviation between the traffic sign category and a real category corresponding to the traffic sign, and a minimum deviation between a width and a height of the image area location and a width and a height of the real area location as optimization targets.
Optionally, the executing module 304 is specifically configured to determine the loss value corresponding to the sample image and the weight corresponding to the sample image according to the deviation between the image area position and the real area position where the image of the traffic sign included in the sample image is located and the deviation between the traffic sign category and the real category corresponding to the traffic sign, where if the deviation between the traffic sign category and the real category corresponding to the traffic sign is greater, the weight corresponding to the sample image is greater; and carrying out weighted summation on the loss value corresponding to each sample image according to the weight corresponding to each sample image to obtain a total loss value, and training the recognition model by taking the minimum total loss value as an optimization target.
The present specification also provides a computer readable storage medium storing a computer program operable to perform a method of traffic sign recognition as provided in fig. 1 above.
The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in the drawings. As shown in fig. 4.
Fig. 4 is a schematic structural view of an electronic device corresponding to fig. 1 provided in the present specification.
As shown, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, although other hardware required by the service is possible. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the training sample generation method described in fig. 1.
Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (10)

1. A method of traffic sign identification, comprising:
acquiring each sample image, wherein each sample image comprises an image of a traffic sign;
inputting each sample image into a pre-constructed recognition model, so as to obtain each expansion image characteristic by expanding the basic characteristic corresponding to the determined sample image through each characteristic embedded network layer contained in the recognition model by using different expansion coefficients, wherein each characteristic embedded network layer is used for performing characteristic embedding on the basic characteristic by using different expansion coefficients;
inputting the characteristics of each expansion image into an identification network layer in the identification model to obtain an identification result corresponding to the sample image;
and training the recognition model according to the recognition result and the label result corresponding to the sample image to obtain a trained recognition model, so as to recognize traffic signs related to the acquired traffic image through the trained recognition network.
2. The method of claim 1, wherein for each sample image, before inputting the sample image into the pre-constructed recognition model, the method further comprises:
Clustering the images of the traffic signs contained in each sample image according to the size of the scale to obtain each cluster;
for each cluster, determining an image frame corresponding to the cluster according to the contour information of each image of the traffic sign contained in the cluster, wherein each image of the traffic sign contained in the cluster corresponds to the image frame;
determining basic characteristics of the sample image specifically comprises the following steps:
inputting the sample image and frame information of an image frame corresponding to the sample image into a pre-constructed recognition model for each sample image, obtaining initial characteristics of the sample image through a trunk characteristic extraction layer in the recognition model, and extracting image characteristics of the image positioned in the image frame from the initial characteristics according to the frame information to obtain basic characteristics of the sample image.
3. The method according to claim 1, wherein for each sample image, the sample image is input into a pre-constructed recognition model, so that the features contained in the recognition model are embedded into a network layer, and the determined basic features corresponding to the sample image are inflated by using different inflation coefficients, so as to obtain various inflated image features, and specifically comprising:
Inputting each sample image into a pre-constructed recognition model for determining the corresponding basic characteristics of the sample image through the recognition model;
inputting the basic features into the feature embedded network layer aiming at each feature embedded network layer so as to divide the basic features through feature division strategies corresponding to the feature embedded network layer to obtain sub-basic features;
embedding each sub-basic feature into the corresponding expansion coefficient of the network layer according to the feature to expand so as to obtain each expanded sub-basic feature;
and aggregating the basic characteristics of each expansion post-sub, and obtaining the characteristics of the expansion image output by the embedded network layer.
4. The method of claim 1, wherein training the recognition model according to the recognition result and the label result corresponding to the sample image specifically comprises:
determining the image area position of the traffic sign image contained in the sample image identified by the identification model and the traffic sign category of the identified traffic sign according to the identification result;
and training the recognition model by taking the deviation between the minimum image area position and the real area position of the image of the traffic sign contained in the sample image and the deviation between the minimum traffic sign category and the real category corresponding to the traffic sign as optimization targets.
5. The method according to claim 4, wherein training the recognition model with the aim of minimizing the deviation between the image area position and the real area position where the image of the traffic sign contained in the sample image is located, and the deviation between the traffic sign category and the real category corresponding to the traffic sign is optimized, specifically includes:
training the recognition model by taking the minimum deviation between the image area position and the real area position of the image of the traffic sign contained in the sample image, the minimum deviation between the traffic sign category and the real category corresponding to the traffic sign and the minimum distance between the center point of the image area position and the center point of the real area position as optimization targets.
6. The method according to claim 4 or 5, wherein training the recognition model with the aim of minimizing the deviation between the image area position and the real area position where the image of the traffic sign contained in the sample image is located, and the deviation between the traffic sign category and the real category corresponding to the traffic sign is optimized, specifically comprises:
Training the recognition model with the aim of minimizing the deviation between the image area position and the real area position where the image of the traffic sign contained in the sample image is located, minimizing the deviation between the traffic sign category and the real category corresponding to the traffic sign, and minimizing the deviation between the width and the height of the image area position and the width and the height of the real area position.
7. The method according to claim 4, wherein training the recognition model with the aim of minimizing the deviation between the image area position and the real area position where the image of the traffic sign contained in the sample image is located, and the deviation between the traffic sign category and the real category corresponding to the traffic sign is optimized, specifically includes:
determining a loss value corresponding to the sample image and a weight corresponding to the sample image according to the deviation between the image area position and the real area position of the image of the traffic sign contained in the sample image and the deviation between the traffic sign category and the real category corresponding to the traffic sign, wherein if the deviation between the traffic sign category and the real category corresponding to the traffic sign is larger, the weight corresponding to the sample image is larger;
And carrying out weighted summation on the loss value corresponding to each sample image according to the weight corresponding to each sample image to obtain a total loss value, and training the recognition model by taking the minimum total loss value as an optimization target.
8. An apparatus for traffic sign recognition, comprising:
the acquisition module is used for acquiring each sample image;
the processing module is used for inputting each sample image into a pre-constructed identification model so as to enable the determined basic features corresponding to the sample image to be expanded by using different expansion coefficients through each feature embedded network layer contained in the identification model, and obtaining each expansion image feature, wherein each feature embedded network layer is used for performing feature embedding on the basic features by using different expansion coefficients;
the generation module is used for inputting the characteristics of each expansion image into an identification network layer in the identification model to obtain an identification result corresponding to the sample image;
and the execution module is used for training the recognition model according to the recognition result and the label result corresponding to the sample image to obtain a trained recognition model so as to recognize traffic signs related to the acquired traffic image through the trained recognition network.
9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.
CN202311457493.7A 2023-11-03 2023-11-03 Traffic sign recognition method and device, storage medium and electronic equipment Active CN117197781B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311457493.7A CN117197781B (en) 2023-11-03 2023-11-03 Traffic sign recognition method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311457493.7A CN117197781B (en) 2023-11-03 2023-11-03 Traffic sign recognition method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN117197781A true CN117197781A (en) 2023-12-08
CN117197781B CN117197781B (en) 2024-04-05

Family

ID=88996444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311457493.7A Active CN117197781B (en) 2023-11-03 2023-11-03 Traffic sign recognition method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117197781B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576522A (en) * 2024-01-18 2024-02-20 之江实验室 Model training method and device based on mimicry structure dynamic defense

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110109476A1 (en) * 2009-03-31 2011-05-12 Porikli Fatih M Method for Recognizing Traffic Signs
US20190147304A1 (en) * 2017-11-14 2019-05-16 Adobe Inc. Font recognition by dynamically weighting multiple deep learning neural networks
CN111291660A (en) * 2020-01-21 2020-06-16 天津大学 Anchor-free traffic sign identification method based on void convolution
CN111340105A (en) * 2020-02-25 2020-06-26 腾讯科技(深圳)有限公司 Image classification model training method, image classification device and computing equipment
CN112766379A (en) * 2021-01-21 2021-05-07 中国科学技术大学 Data equalization method based on deep learning multi-weight loss function
CN115937703A (en) * 2022-11-30 2023-04-07 南京林业大学 Enhanced feature extraction method for remote sensing image target detection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110109476A1 (en) * 2009-03-31 2011-05-12 Porikli Fatih M Method for Recognizing Traffic Signs
US20190147304A1 (en) * 2017-11-14 2019-05-16 Adobe Inc. Font recognition by dynamically weighting multiple deep learning neural networks
CN111291660A (en) * 2020-01-21 2020-06-16 天津大学 Anchor-free traffic sign identification method based on void convolution
CN111340105A (en) * 2020-02-25 2020-06-26 腾讯科技(深圳)有限公司 Image classification model training method, image classification device and computing equipment
CN112766379A (en) * 2021-01-21 2021-05-07 中国科学技术大学 Data equalization method based on deep learning multi-weight loss function
CN115937703A (en) * 2022-11-30 2023-04-07 南京林业大学 Enhanced feature extraction method for remote sensing image target detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN DONG等: ""Control Distance IoU and Control Distance IoU Loss for Better Bounding Box Regression"", 《PATTERN RECOGNITION》, vol. 137 *
赵棣宇: ""基于深度学习的无人驾驶交通标志检测算法研究"", 《万方数据库》, pages 1 - 95 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576522A (en) * 2024-01-18 2024-02-20 之江实验室 Model training method and device based on mimicry structure dynamic defense
CN117576522B (en) * 2024-01-18 2024-04-26 之江实验室 Model training method and device based on mimicry structure dynamic defense

Also Published As

Publication number Publication date
CN117197781B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN109034183B (en) Target detection method, device and equipment
CN117197781B (en) Traffic sign recognition method and device, storage medium and electronic equipment
CN111797711A (en) Model training method and device
CN115600157B (en) Data processing method and device, storage medium and electronic equipment
CN113887608B (en) Model training method, image detection method and device
CN112990099B (en) Method and device for detecting lane line
CN116186330B (en) Video deduplication method and device based on multi-mode learning
CN112861831A (en) Target object identification method and device, storage medium and electronic equipment
CN117036829A (en) Method and system for achieving label enhancement based on prototype learning for identifying fine granularity of blade
CN112365513A (en) Model training method and device
CN115984154A (en) Image fusion method and device, storage medium and electronic equipment
CN112734851B (en) Pose determination method and device
CN114359935A (en) Model training and form recognition method and device
CN114187355A (en) Image calibration method and device
CN111426299B (en) Method and device for ranging based on depth of field of target object
CN115018866A (en) Boundary determining method and device, storage medium and electronic equipment
CN111104908A (en) Road edge determination method and device
CN114528923B (en) Video target detection method, device, equipment and medium based on time domain context
CN117726907B (en) Training method of modeling model, three-dimensional human modeling method and device
CN116740197B (en) External parameter calibration method and device, storage medium and electronic equipment
CN116188919B (en) Test method and device, readable storage medium and electronic equipment
CN117237744B (en) Training method and device of image classification model, medium and electronic equipment
CN116935055B (en) Attention mask-based weak supervision semantic segmentation method and device
CN116704178A (en) Image instance segmentation method and device, storage medium and electronic equipment
CN115641438A (en) Semantic segmentation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant