CN111274942A

CN111274942A - Traffic cone identification method and device based on cascade network

Info

Publication number: CN111274942A
Application number: CN202010060114.0A
Authority: CN
Inventors: 王鹤; 李润泽; 王博; 高嵩; 徐月云; 刘洋; 张�杰
Original assignee: Guoqi Beijing Intelligent Network Association Automotive Research Institute Co ltd
Current assignee: Guoqi Beijing Intelligent Network Association Automotive Research Institute Co ltd
Priority date: 2020-01-19
Filing date: 2020-01-19
Publication date: 2020-06-12

Abstract

The embodiment of the invention discloses a traffic cone identification method and device based on a cascade network. The traffic cone identification method based on the cascade network comprises the following steps: acquiring a compression and activation network SEnet and a dense convolution network DenseNet; determining a target network structure based on the SENEt, the DenseNet and a preset target detection model; and training a target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images containing traffic cones. According to the embodiment of the invention, the traffic cone identification model with higher identification accuracy can be provided, so that the traffic cone can be identified more accurately.

Description

Traffic cone identification method and device based on cascade network

Technical Field

The invention belongs to the technical field of deep learning, and particularly relates to a traffic cone identification method and device based on a cascade network, electronic equipment and a computer storage medium.

Background

In recent years, unmanned driving technology is gradually mature, safety problems are highly emphasized by governments, scientific research institutions and automobile manufacturers, and a traffic cone is taken as one of important traffic signs, is usually placed on a road or a sidewalk, is used for temporarily changing traffic directions, blocking areas, warning roadside construction, lane accidents and the like, and plays a vital role in ensuring safe driving. Therefore, it is particularly important to design a traffic cone identification and detection system with good real-time performance and high accuracy, and the traffic cone identification and detection system can play a role in reducing traffic accidents.

Because the traffic cones on the roads or sidewalks are temporary and movable, the traffic cones cannot be identified on the high-definition map and need to be identified by a vehicle-mounted sensor. In the aspect of traffic cone detection and identification, the currently adopted method is based on the traditional image processing method, removes complex background through image processing, extracts traffic cones and detects the traffic cones through morphological processing.

Compared with the traditional image processing method, the target detection method based on deep learning can independently learn the features of different levels, and the more abundant the learned features are, the higher the accuracy is. The target detection method based on deep learning mainly utilizes a preset target detection model (for example, a Cascade R-CNN model) to perform target recognition, but the target detection model is simpler and influences the accuracy of target recognition.

Therefore, how to provide a traffic cone recognition model with higher recognition accuracy, and further to recognize a traffic cone more accurately is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the invention provides a traffic cone identification method, a traffic cone identification device, electronic equipment and a computer storage medium based on a cascade network, and can provide a traffic cone identification model with higher identification accuracy so as to identify a traffic cone more accurately.

In a first aspect, a traffic cone identification method based on a cascade network is provided, which includes:

acquiring a compression and activation network SEnet and a dense convolution network DenseNet;

determining a target network structure based on the SENEt, the DenseNet and a preset target detection model;

and training a target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images containing traffic cones.

Optionally, determining the target network structure based on the SENet, the DenseNet and a preset target detection model includes:

determining a target convolutional layer in the DenseNet; wherein, the target convolution layer outputs a target characteristic diagram;

adding SENEt after each target convolution layer in the DenseNet to obtain a backbone network;

and replacing the original backbone network in the target detection model by using the backbone network to obtain a target network structure.

Optionally, training a target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images including traffic cones includes:

acquiring a plurality of original traffic cone scene images;

performing data enhancement on any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image;

establishing a label file corresponding to each first traffic cone scene image to obtain a traffic cone scene image data set;

dividing a traffic cone scene image data set to obtain a training set, a verification set and a test set;

and determining a traffic cone recognition model based on the training set, the verification set and the target network structure.

Optionally, performing data enhancement on any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image, including:

and performing at least one of rotation, turnover, contrast enhancement, cutting, brightness adjustment and affine transformation on any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image.

Optionally, determining a traffic cone recognition model based on the training set, the validation set, and the target network structure includes:

training a target network structure by using a training set to obtain an initial traffic cone recognition model;

determining the accuracy and/or loss value of the initial traffic cone identification model by using the verification set;

and when the accuracy is greater than the accuracy threshold and/or the loss value is less than the loss value threshold, determining the initial traffic cone identification model as the traffic cone identification model.

Optionally, after determining that the initial traffic cone identification model is the traffic cone identification model, the method further includes:

and verifying the accuracy and the recognition speed of the traffic cone recognition model by using the test set.

In a second aspect, a traffic cone identification model is provided, where the traffic cone identification model is obtained by using the cascade network-based traffic cone identification method of the first aspect, and includes:

acquiring an image to be identified;

inputting an image to be recognized into a traffic cone recognition model, and outputting a recognition result; and the identification result is that the image to be identified has the traffic cone and the position of the traffic cone in the image to be identified or the image to be identified has no traffic cone.

In a third aspect, a device of a traffic cone identification method based on a cascade network is provided, which includes:

the acquisition module is used for acquiring a compression and activation network SEnet and a dense convolution network DenseNet;

the determining module is used for determining a target network structure based on the SEnet, the DenseNet and a preset target detection model;

and the training module is used for training a target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images containing traffic cones.

Optionally, the determining module is configured to determine a target convolutional layer in the DenseNet; wherein, the target convolution layer outputs a target characteristic diagram; adding SENEt after each target convolution layer in the DenseNet to obtain a backbone network; and replacing the original backbone network in the target detection model by using the backbone network to obtain a target network structure.

Optionally, the training module is configured to acquire a plurality of original traffic cone scene images; performing data enhancement on any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image; establishing a label file corresponding to each first traffic cone scene image to obtain a traffic cone scene image data set; dividing a traffic cone scene image data set to obtain a training set, a verification set and a test set; and determining a traffic cone recognition model based on the training set, the verification set and the target network structure.

Optionally, the training module is configured to perform at least one of rotation, flipping, contrast enhancement, clipping, brightness adjustment, and affine transformation on any one of the original traffic cone scene images to obtain at least one first traffic cone scene image corresponding to any one of the original traffic cone scene images.

Optionally, the training module is configured to train the target network structure by using a training set to obtain an initial traffic cone recognition model; determining the accuracy and/or loss value of the initial traffic cone identification model by using the verification set; and when the accuracy is greater than the accuracy threshold and/or the loss value is less than the loss value threshold, determining the initial traffic cone identification model as the traffic cone identification model.

Optionally, the training module is further configured to verify the accuracy and the recognition speed of the traffic cone recognition model by using the test set.

In a fourth aspect, a device based on a traffic cone identification model obtained by using the cascade network-based traffic cone identification method of the first aspect is provided, and the device includes:

the acquisition module is used for acquiring an image to be identified;

the output module is used for inputting the image to be recognized into the traffic cone recognition model and outputting a recognition result; and the identification result is that the image to be identified has the traffic cone and the position of the traffic cone in the image to be identified or the image to be identified has no traffic cone.

In a fifth aspect, an electronic device is provided, the electronic device comprising: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the cascade network-based traffic cone identification method of the first aspect.

In a sixth aspect, a computer storage medium is provided, on which computer program instructions are stored, which, when executed by a processor, implement the cascade network-based traffic cone identification method of the first aspect.

The traffic cone identification method, the traffic cone identification device, the electronic equipment and the computer storage medium based on the cascade network can provide a traffic cone identification model with higher identification accuracy, so that the traffic cone can be identified more accurately. The traffic cone recognition method based on the cascade network determines a target network structure based on SENet, DenseNet and a preset target detection model, the SENet builds a model through the correlation among characteristic diagram channels, enhances the characteristics related to the preset target, weakens irrelevant characteristics, realizes the characteristic recalibration, solves the problem that a large amount of redundant information exists in the characteristics extracted by the existing target detection model, and can recognize the traffic cone more accurately based on a plurality of original traffic cone scene images containing the traffic cone and the traffic cone recognition model obtained by training the target network structure.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a traffic cone identification method based on a cascade network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a SENET network structure provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a connection between a SE-Dense Block module and a SE-Transition module according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of an improved Cascade R-CNN algorithm provided in the embodiments of the present invention;

FIG. 5 is a flow chart of another traffic cone identification method based on a cascade network according to an embodiment of the present invention;

FIG. 6 is a flow chart illustrating a method for using a traffic cone recognition model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an apparatus of a traffic cone identification method based on a cascade network according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an apparatus based on a traffic cone identification model using method according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In the aspect of traffic cone detection and identification, the currently adopted method is based on the traditional image processing method, removes complex background through image processing, extracts traffic cones and detects the traffic cones through morphological processing. Compared with the traditional image processing method, the target detection method based on deep learning can independently learn the features of different levels, and the more abundant the learned features are, the higher the accuracy is. The target detection method based on deep learning mainly utilizes a preset target detection model (for example, a Cascade R-CNN model) to perform target recognition, but the target detection model is simpler and influences the accuracy of target recognition.

In order to solve the problems in the prior art, embodiments of the present invention provide a method, an apparatus, an electronic device, and a computer storage medium for identifying a traffic cone based on a cascade network. First, a traffic cone identification method based on a cascade network provided by the embodiment of the invention is described below.

Fig. 1 is a schematic flow chart of a traffic cone identification method based on a cascade network according to an embodiment of the present invention. As shown in fig. 1, the method for identifying a traffic cone based on a cascade network includes:

s101, acquiring a compression and activation network SEnet and a dense convolution network DenseNet.

S102, determining a target network structure based on the SENET, the DenseNet and a preset target detection model.

The compression and activation network (SEnet) is not a complete network structure, but a substructure, which can be embedded into other classification or detection models, and the core idea is to enhance the target features and compress the non-relevant features by learning the feature weights according to a loss (loss) function through the network. The dynamic adaptability to the input image is introduced essentially, which is helpful for enhancing the feature distinguishing capability and realizing the feature recalibration. The characteristic weight calibration refers to changing the characteristics of the characteristic value output by the network in a differentiated weight mode so as to achieve the purpose of obviously highlighting the effective characteristics.

The configuration diagram of SENet is as shown in fig. 2, and H, W, C in fig. 2 indicates height, width, and channel, respectively, so H × W × C indicates height × width × channel. For the input feature U, the first thing to do is a compression (Squeeze) operation F_sqFirstly, the SEnet structure adopts a global pooling mode to carry out feature compression along the space dimension, each two-dimensional feature channel is changed into a real number, the real number has a global receptive field to some extent, and the output dimension is matched with the number of input feature channels. It characterizes the global distribution of responses over the feature channels and makes it possible to obtain a global receptive field also for layers close to the input, which is very useful in many tasks. The next is the activation (Excitation) operation F_ex(. W), inspiration comes from the Long Short-Term Memory (LSTM) door mechanism, where parameters are learned to explicitly model the correlation between feature channels. Finally, the weight of the feature is reset, the weight of the activation operation output is regarded as the importance of each feature channel after feature selection, and then multiplication F is carried out_scale(-) channel-by-channel weighting onto previous features to get features

Completing the recalibration of the original features in the channel dimension.

In order to determine a target network structure with higher accuracy, in one embodiment, the determining the target network structure based on the SENet, the DenseNet and a preset target detection model may generally include: determining a target convolutional layer in the DenseNet; wherein, the target convolution layer outputs a target characteristic diagram; adding SENEt after each target convolution layer in the DenseNet to obtain a backbone network; and replacing the original backbone network in the target detection model by using the backbone network to obtain a target network structure.

Alternatively, in one embodiment, the densnet may be embodied as densnet-169, with SENet added after each target convolutional layer in densnet-169, resulting in a backbone network (i.e., SE-densnet-169 network). The SENet can make up the deficiency of the DenseNet-169 in the feature superposition multiplexing in the dimension of a channel (channel), the characterization capability of a model is improved by modeling the channel correlation among the convolutional layer features through display, the feature recalibration is realized by selectively enhancing the target related features and simultaneously compressing the non-related features by using global information, so that the network potential of the DenseNet-169 is further mined, and the feature extraction effect is improved.

In one embodiment, the SE-denenet-169 network adopts a "SE-detect Block + SE-Transition" structure, and comprises 4 SE-detect blocks, and the SE-detect blocks are connected by SE-transitions, and the network structure of the SE-denenet-169 network is specifically shown in table 1:

TABLE 1

As shown in FIG. 3, FIG. 3 is a detailed SE-Dense Block and SE-Transition Block. Wherein, the SE-Dense Block module adopts the ideas of feature multiplexing and short-circuit connections (short connections) to connect all the convolution layers with each other.

The four SE-Dense Block modules of the SE-DenseNet-169 network have the layer numbers of 6, 12, 32 and 32 respectively, namely 6, 12, 32 and 32 structures of 'BN-ReLU-1 x1 Conv-BN-ReLU-3 x 3 Conv' respectively. Wherein bn (batch normalization) indicates normalization; ReLU (rectified Linear Unit) represents a step function, which is an activation function.

The input and output of each part inside the SE-Dense Block and the SE-Transition Block are feature maps (feature maps), and the input and output between the SE-Dense Block and the SE-Transition Block are also feature maps (feature maps).

The input of each layer is a feature map of all previous layers spliced (Concat, see C in fig. 3) in the channel dimension, and in order to make the splicing operation possible, the feature map size of each layer needs to be set to be the same; adding a SENet structure between two layers, adding weights to each channel (channel) of the feature map according to the channel correlation between the convolutional layer features, selectively enhancing useful features (correlated features) by using global information, and simultaneously compressing useless features (uncorrelated features) to realize feature recalibration.

A SEnet structure is also added into the SE-Transition module, the SE-Transition module mainly plays a role in feature dimension reduction, more and more features generated by splicing are avoided, and overfitting is effectively inhibited. The number of channels passed to the next SE-Dense Block module is reduced by half by making the output channel half the input channel for the 1x1 convolutional layer (BN-ReLU-1x1 Conv) in the SE-Transition module. After the last SE-sense Block module, a global averaging pooling (averaging) layer and a normalized exponential function (softmax) classification layer are concatenated. Wherein the average pooling layer is used to vary the size of the feature map.

In one embodiment, the SE-DenseNet-169 network is used for replacing a ResNet50 network in a Cascade R-CNN model, the obtained target network structure is an improved Cascade R-CNN network structure, and the SE-DenseNet-169 network is a backbone network (backbone) in the improved Cascade R-CNN network structure.

In an embodiment, an improved Cascade R-CNN algorithm flow corresponding to the improved Cascade R-CNN Network structure is shown in fig. 4, where a trunk Network of the improved Cascade R-CNN end-to-end Cascade detection Network is an SE-densneet-169 Network, an image (image) is input into the SE-densneet-169 Network, a Feature map is output, the output Feature map is input into a Feature Pyramid Network (Feature Pyramid Networks, FPN) output Feature map, the output Feature map is input into a Region suggestion Network (Region pro security Network, RPN), and candidate frames (pro security) with scores (score) are output, and the candidate frames are target candidate regions. Then, 2000 target candidate Regions are selected by utilizing non-maximum suppression, the generated target candidate Regions are placed into a region of Interest (ROI) pooling layer, region pooling is carried out on a feature map output by a feature mapping layer of a network generated by the target candidate Regions, region features with fixed sizes are generated, namely, the ROI is mapped to the corresponding positions of the feature map, and the feature map with the same size is output.

And putting the target candidate region passing through the ROI posing layer into a full connection layer, inputting a subsequent normalized exponential function (softmax) classification layer and a bounding box (bbox) regression layer, and performing target classification and target bounding box regression correction. In fig. 4, C0, C1, C2, and C3 represent softmax target classifications in different stages (stages), respectively, and B1, B2, and B3 represent bounding box regressions in different stages (stages), respectively.

It is well known to those skilled in the art that cross-over-unity (IOU) threshold selection tends to result in high quality samples, but causes problems such as overfitting. In this embodiment, three cascaded R-CNN networks are trained by using different IOU thresholds, and through iterative bounding box regression, a bounding box coordinate result obtained by previous detection model regression is initialized to a bounding box of a next detection model, and then regression is continued to gradually improve the detection accuracy of the bounding box.

The intersection ratio represents the overlapping degree of the candidate frame (candidate frame) and the original mark frame (ground truth frame), namely the ratio of the intersection and the union of the candidate frame and the original mark frame, and is used for judging the Positive sample (Positive) and the Negative sample (Negative) in the training stage. Different IOU threshold values are set, so that the accuracy of each network output is improved a little and used as the input of the next higher-accuracy network, the accuracy of the network output is further improved step by step, the operation speed and the detection accuracy are considered, the network structure is simple to realize, and the requirement of the actual software engineering environment performance is met.

S103, training a target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images containing traffic cones.

In order to obtain a more accurate traffic cone recognition model, in one embodiment, training the target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images including traffic cones may generally include: acquiring a plurality of original traffic cone scene images; performing data enhancement on any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image; establishing a label file corresponding to each first traffic cone scene image to obtain a traffic cone scene image data set; dividing a traffic cone scene image data set to obtain a training set, a verification set and a test set; and determining a traffic cone recognition model based on the training set, the verification set and the target network structure.

In order to obtain a greater number of first traffic cone scene images, in an embodiment of the present invention, data enhancement is performed on any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image, which may generally include: and performing at least one of rotation, turnover, contrast enhancement, cutting, brightness adjustment and affine transformation on any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image.

To obtain a more accurate traffic cone recognition model, in one embodiment of the present invention, determining a traffic cone recognition model based on a training set, a validation set, and a target network structure may generally include: training a target network structure by using a training set to obtain an initial traffic cone recognition model; determining the accuracy and/or loss value of the initial traffic cone identification model by using the verification set; and when the accuracy is greater than the accuracy threshold and/or the loss value is less than the loss value threshold, determining the initial traffic cone identification model as the traffic cone identification model.

In order to ensure the practicability of the traffic cone identification model, in one embodiment of the present invention, after determining that the initial traffic cone identification model is the traffic cone identification model, the method may further generally include: and verifying the accuracy and the recognition speed of the traffic cone recognition model by using the test set.

The above is described in an embodiment, which is as follows (see fig. 5):

s1, establishing an image library: the method comprises the steps that scene images containing traffic cones of different scenes, different weather and different illumination are collected through a high-definition camera which is installed on a front windshield of a vehicle and has a depression angle of 8 degrees, the images are saved at a speed of 10 frames per second, and the images are normalized to a size of 1920 x 1080 pixels.

S2, data enhancement: data enhancement mainly includes rotation, upset, contrast reinforcing, tailorring, brightness adjustment, affine transform, and experiments find that the angle of random rotation sets up to minus 5 to plus 5 degrees, and the upset sets up to the upset of image about 10% at random, and the less degree changes luminance and contrast at random, can guarantee the meaning of image, for effectual data enhancement, adjusts the size (resize) of picture to certain size at last to expand effectual data sample, the variety of reinforcing data set.

S3, establishing a label library: and (4) making a label corresponding to the image according to the json label file standard of the COCO format through labeling software.

S4, establishing a data set: and dividing the manufactured image library and the label library thereof into a training set, a verification set and a test set, wherein the proportion of the training set, the verification set and the test set is 7:2: 1.

S5, setting parameters of a Cascade R-CNN model, namely setting 4 stages for the Cascade R-CNN model, setting an RPN and an R-CNN network with three IOUs gradually increasing, and training gradually. RPN network anchor (a set of fixed-size reference windows) is divided into five levels, 4, 8, 16, 32, 64 respectively, with an aspect ratio of 3 levels, 1: 1. 1: 2. 2: 1. the three IOUs are respectively 0.5, 0.6 and 0.7. The classification loss function adopts a softmax function, the regression loss function adopts a smooth L1 loss function, and the coordinates of the bounding box are normalized in order to prevent the problem of regression scale caused by the size and the position of the bounding box.

S6, training a traffic cone recognition model: training a training set by using an improved Cascade R-CNN network structure, setting that 8 pictures are processed each time in the training process, wherein the learning rate is smaller in the first 11 training periods (epoch), the learning rate is 0.02 after the model is stabilized, the proportion and the size of anchors and the parameters of a local loss function are kept unchanged, the performance of the model is increased along with the iteration times of the training, the training model is stored once at certain iteration times, and when the accuracy of a verification set and the loss value of the training tend to be stable, namely the model is converged, the model is stored and the training is stopped. In addition, in the training process, a random gradient descent (SGD) algorithm is adopted for parameter updating, and the convergence speed of the model is accelerated in a Warmup pre-heating learning rate mode, so that the model is better, and the training steps are 20 ten thousand.

S7, testing a traffic cone recognition model: the detection accuracy rate on the traffic cone test set reaches 95%, the detection speed is 9FPS (including picture reading time), compared with the original Cascade R-CNN model, the improved Cascade R-CNN model has the advantages that the detection accuracy rate is improved, the trained model has better robustness, and compared with the original algorithm, the traffic cone detection recognition method has greater advantages.

The traffic cone recognition method and device can be used for quickly, timely and effectively recognizing the traffic cone, reducing the influence of factors such as illumination change, color fading, motion blur, complex background and the like on the traffic cone recognition, have the anti-interference capacity, are high in recognition accuracy, and can timely eliminate the potential safety hazard of unmanned driving. Meanwhile, the embodiment of the invention also utilizes a data enhancement technology, expands effective data samples under the condition of insufficient quantity, enhances the diversity of data sets and improves the detection accuracy of the traffic cone identification model.

The embodiment also provides a traffic cone identification method based on a traffic cone identification model, as shown in fig. 6, the traffic cone identification model is a model obtained by using the traffic cone identification method based on the cascade network according to the embodiment shown in fig. 1, and includes:

s601, acquiring an image to be identified.

S602, inputting the image to be recognized into a traffic cone recognition model, and outputting a recognition result; and the identification result is that the image to be identified has the traffic cone and the position of the traffic cone in the image to be identified or the image to be identified has no traffic cone.

The embodiment also provides a device of the traffic cone identification method based on the cascade network, and the device of the traffic cone identification method based on the cascade network and the traffic cone identification method based on the cascade network described above can be correspondingly referred to each other. As shown in fig. 7, the device of the traffic cone identification method based on the cascade network includes:

an obtaining module 701, configured to obtain a compressed and activated network SENet and a dense convolutional network DenseNet;

a determining module 702, configured to determine a target network structure based on the SENet, the DenseNet, and a preset target detection model;

the training module 703 is configured to train a target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images including traffic cones.

Optionally, a determining module 702, configured to determine a target convolutional layer in the DenseNet; wherein, the target convolution layer outputs a target characteristic diagram; adding SENEt after each target convolution layer in the DenseNet to obtain a backbone network; and replacing the original backbone network in the target detection model by using the backbone network to obtain a target network structure.

Optionally, the training module 703 is configured to obtain a plurality of original traffic cone scene images; performing data enhancement on any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image; establishing a label file corresponding to each first traffic cone scene image to obtain a traffic cone scene image data set; dividing a traffic cone scene image data set to obtain a training set, a verification set and a test set; and determining a traffic cone recognition model based on the training set, the verification set and the target network structure.

Optionally, the training module 703 is configured to perform at least one of rotation, flipping, contrast enhancement, clipping, brightness adjustment, and affine transformation on any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image.

Optionally, the training module 703 is configured to train the target network structure by using a training set to obtain an initial traffic cone identification model; determining the accuracy and/or loss value of the initial traffic cone identification model by using the verification set; and when the accuracy is greater than the accuracy threshold and/or the loss value is less than the loss value threshold, determining the initial traffic cone identification model as the traffic cone identification model.

Optionally, the training module 703 is further configured to verify the accuracy and the recognition speed of the traffic cone recognition model by using the test set.

Each module in the device of the traffic cone identification method based on the cascade network provided in fig. 7 has a function of implementing each step in the example shown in fig. 1, and achieves the same technical effect as the traffic cone identification method based on the cascade network shown in fig. 1, and for brevity, the details are not repeated here.

The embodiment also provides a device based on the traffic cone identification model, wherein the traffic cone identification model is obtained by using the cascade network-based traffic cone identification method of the embodiment shown in fig. 1, and the device based on the traffic cone identification model and the above-described traffic cone identification model-based usage method can be correspondingly referred to each other. As shown in fig. 8, the apparatus based on the traffic cone recognition model using method includes:

an obtaining module 801, configured to obtain an image to be identified;

the output module 802 is used for inputting the image to be recognized into the traffic cone recognition model and outputting a recognition result; and the identification result is that the image to be identified has the traffic cone and the position of the traffic cone in the image to be identified or the image to be identified has no traffic cone.

Each module in the device based on the traffic cone identification model using method provided in fig. 8 has a function of implementing each step in the example shown in fig. 6, and achieves the same technical effect as the traffic cone identification model using method shown in fig. 6, and for brevity, details are not repeated here.

The electronic device may comprise a processor 901 and a memory 902 storing computer program instructions.

Specifically, the processor 901 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured as one or more Integrated circuits implementing the embodiments of the present invention.

Memory 902 may include mass storage for data or instructions. By way of example, and not limitation, memory 902 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 902 may include removable or non-removable (or fixed) media, where appropriate. The memory 902 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 902 is a non-volatile solid-state memory. In a particular embodiment, the memory 902 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.

The processor 901 reads and executes the computer program instructions stored in the memory 902 to implement the cascade network-based traffic cone identification method in the embodiment shown in fig. 1 among the above-described embodiments.

In one example, the electronic device can also include a communication interface 903 and a bus 910. As shown in fig. 9, the processor 901, the memory 902, and the communication interface 903 are connected via a bus 910 to complete communication with each other.

The communication interface 903 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present invention.

Bus 910 includes hardware, software, or both to couple the components of the online data traffic billing device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 910 can include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.

In addition, in combination with the traffic cone identification method based on the cascade network in the above embodiments, the embodiments of the present invention may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by the processor, implement the cascade network based traffic cone identification method in the embodiment shown in fig. 1.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. A traffic cone identification method based on a cascade network is characterized by comprising the following steps:

and training the target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images containing traffic cones.

2. The cascade network-based traffic cone recognition method of claim 1, wherein the determining a target network structure based on the SENET, the DenseNet and a preset target detection model comprises:

determining a target convolutional layer in the DenseNet; wherein the target convolutional layer outputs the target feature map;

adding the SENEt after each target convolution layer in the DenseNet to obtain a backbone network;

and replacing the original backbone network in the target detection model by using the backbone network to obtain the target network structure.

3. The cascade network-based traffic cone recognition method according to claim 1 or 2, wherein training the target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images containing traffic cones comprises:

acquiring a plurality of original traffic cone scene images;

dividing the traffic cone scene image data set to obtain a training set, a verification set and a test set;

determining the traffic cone recognition model based on the training set, the validation set, and the target network structure.

4. The cascade network-based traffic cone identification method according to claim 3, wherein the data enhancement of any original traffic cone scene image to obtain at least one first traffic cone scene image corresponding to any original traffic cone scene image comprises:

5. The cascade network-based traffic cone recognition method of claim 3, wherein the determining the traffic cone recognition model based on the training set, the validation set, and the target network structure comprises:

training the target network structure by using the training set to obtain an initial traffic cone recognition model;

determining an accuracy and/or loss value of the initial traffic cone identification model using the validation set;

when the accuracy is greater than an accuracy threshold and/or the loss value is less than a loss value threshold, determining that the initial traffic cone identification model is the traffic cone identification model.

6. The cascade network-based traffic cone identification method of claim 5, wherein after determining that the initial traffic cone identification model is the traffic cone identification model, further comprising:

7. A method for using a traffic cone identification model, wherein the traffic cone identification model is obtained by using the cascade network-based traffic cone identification method of any one of claims 1 to 6, and comprises the following steps:

acquiring an image to be identified;

inputting the image to be recognized into the traffic cone recognition model, and outputting a recognition result; and the identification result is that the image to be identified has a traffic cone and the position of the traffic cone in the image to be identified or the image to be identified has no traffic cone.

8. A device of a traffic cone identification method based on a cascade network is characterized by comprising the following steps:

and the training module is used for training the target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images containing traffic cones.

9. An apparatus based on a traffic cone identification model using method, wherein the traffic cone identification model is obtained by using the cascade network based traffic cone identification method of any one of claims 1 to 6, and the apparatus comprises:

the acquisition module is used for acquiring an image to be identified;

the output module is used for inputting the image to be recognized into the traffic cone recognition model and outputting a recognition result; and the identification result is that the image to be identified has a traffic cone and the position of the traffic cone in the image to be identified or the image to be identified has no traffic cone.

10. An electronic device, characterized in that the electronic device comprises: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements a cascade network based traffic cone identification method according to any one of claims 1-6.

11. A computer storage medium having computer program instructions stored thereon, which when executed by a processor, implement the cascade network-based traffic cone identification method according to any one of claims 1 to 6.