CN112380918A

CN112380918A - Road vehicle state identification method and device, electronic equipment and storage medium

Info

Publication number: CN112380918A
Application number: CN202011145554.2A
Authority: CN
Inventors: 刘建虢; 尹晓雪
Original assignee: Xian Cresun Innovation Technology Co Ltd
Current assignee: Xian Cresun Innovation Technology Co Ltd
Priority date: 2020-10-23
Filing date: 2020-10-23
Publication date: 2021-02-19

Abstract

The invention discloses a road vehicle state identification method, which is applied to an unmanned vehicle and comprises the following steps: acquiring a vehicle image; carrying out feature extraction on the vehicle image by using a backbone network of an image recognition network to obtain x feature maps; the scales of the x characteristic graphs are sequentially increased; x is a natural number of 3 or more; carrying out top-down and dense connection type feature fusion on the x feature graphs by utilizing an FPN (field programmable gate array) network of the image recognition network to obtain the vehicle state of the vehicle image, wherein the vehicle state comprises the position and the category of each vehicle; generating a driving control instruction based on the vehicle state; the backbone network comprises a plurality of dense connection modules and transition modules which are connected in series at intervals; the image recognition network is obtained by performing iterative training according to the vehicle image sample and the corresponding vehicle state. According to the invention, the vehicle state can be acquired by detecting the road vehicle image, and the driving control instruction is generated, so that the purpose of controlling the driving state of the unmanned automobile is realized.

Description

Road vehicle state identification method and device, electronic equipment and storage medium

Technical Field

The invention belongs to the technical field of image detection, and particularly relates to a road vehicle state identification method and device, electronic equipment and a storage medium.

Background

The unmanned automobile is an intelligent automobile which senses road environment through a vehicle-mounted sensing system, automatically plans a driving route and controls the automobile to reach a preset target place. The unmanned technology controls the driving direction, the driving speed and the like of the current unmanned automobile according to the states of other vehicles on the road so as to realize good avoidance and achieve the purpose of safe driving. Therefore, it is important to recognize the state of the road vehicle.

Nowadays, image detection algorithms for various targets are rapidly developed, and a yolo (young Only Look once) series network model is widely applied.

How to realize the detection of road vehicle images based on a YOLO series network model so as to realize the state recognition of road vehicles and further control the driving state of unmanned automobiles is a research direction with practical significance.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides a road vehicle state identification method, apparatus, electronic device and storage medium. The technical problem to be solved by the invention is realized by the following technical scheme:

in a first aspect, an embodiment of the present invention provides a method for identifying a state of a road vehicle, which is applied to an unmanned vehicle, and includes:

acquiring a vehicle image;

carrying out feature extraction on the vehicle image by using a backbone network of an image recognition network to obtain x feature maps; the scales of the x characteristic graphs are sequentially increased; x is a natural number of 3 or more;

performing top-down and dense connection type feature fusion on the x feature maps by using an FPN (field programmable gate array) network of the image recognition network to obtain the vehicle state of the vehicle image, wherein the vehicle state comprises the position and the category of each vehicle;

generating a driving control instruction based on the vehicle state;

the backbone network comprises a plurality of densely connected modules and transition modules which are connected in series at intervals; the image recognition network is obtained by performing iterative training according to the vehicle image sample and the corresponding vehicle state.

Optionally, the dense connection modules include serially connected convolutional network modules and dense connection unit groups, the number of the dense connection modules is at least three, and the dense connection unit groups include a plurality of dense connection units.

Optionally, the convolutional network module includes a convolutional layer, a BN layer, and a leakage layer, which are connected in sequence; the dense connection unit comprises a plurality of densely connected convolution network modules, and combines the x characteristic graphs output by the convolution network modules in a cascading mode.

Optionally, the transition module includes a plurality of convolution network modules and a maximum pooling layer, which are connected in sequence, and an output of the convolution network module and an output of the maximum pooling layer are connected in cascade.

Optionally, the FPN network includes x prediction branches Y corresponding to the scale of the x feature maps₁～Y_x。

Optionally, the performing, from top to bottom, feature fusion in a dense connection form on the x feature maps by using the FPN network of the image recognition network includes:

for predicted branch Y_iObtaining a feature map with a corresponding scale from the x feature maps, performing convolution processing, and comparing the feature map after convolution processing with the prediction branch Y_i-1～Y₁Performing cascade fusion on the feature maps respectively subjected to the upsampling treatment; wherein branch Y is predicted_i-jHas an upsampling multiple of 2^j(ii) a i is 2, 3, …, x; j is a natural number smaller than i.

Optionally, the driving control instruction includes information of a route traveled by the vehicle and a speed value traveled by the vehicle.

In a second aspect, an embodiment of the present invention further provides a road vehicle state identification device, including:

the image acquisition module is used for acquiring a vehicle image;

the image detection module is used for extracting the features of the vehicle image by utilizing a backbone network of an image recognition network to obtain x feature maps; the scales of the x characteristic graphs are sequentially increased; x is a natural number of 3 or more;

the image identification module is used for carrying out top-down and dense connection type feature fusion on the x feature maps by utilizing an FPN (field programmable gate array) network of the image identification network to obtain the vehicle state of the vehicle image, wherein the vehicle state comprises the position and the category of each vehicle;

the vehicle control module is used for generating a driving control instruction based on the vehicle state;

In a third aspect, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the method when executing the program stored in the memory.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the above-mentioned method steps.

According to the road vehicle state identification method provided by the embodiment of the invention, a residual error module in a main network of a YOLOv3 network in the prior art is replaced by a dense connection module. When the features are extracted, the original parallel feature fusion mode is changed into a serial mode by using a dense connection module, and the feature graph obtained in the early stage is used as the input of each layer of feature graph in the later stage, so that the feature graph with more information content can be obtained, the feature transfer is strengthened, and the detection precision is improved; the extracted feature graph is subjected to top-down dense connection type feature fusion, namely, deep features are subjected to up-sampling of different multiples and are fused with shallow features in series, more original information can be obtained, high-dimensional semantic information participates in a shallow network, the detection precision is improved, more specific features can be obtained by directly receiving the features of the shallow network, the feature loss is effectively reduced, and prediction can be accelerated by reducing the parameter amount of operation. On the basis, the vehicle image is detected, so that the vehicle state can be acquired, the driving control command can be generated, and the driving state of the unmanned automobile can be controlled.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

Fig. 1 is a schematic flow chart of a road vehicle state identification method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a prior art YOLOv3 network;

FIG. 3 is a schematic structural diagram of an image recognition network according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a transition module provided in an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a road vehicle state identification device provided by the embodiment of the invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.

In order to realize detection of road vehicle images based on a YOLO series network model, thereby realizing road vehicle state recognition and further controlling the driving state of an unmanned automobile, the embodiment of the invention provides a road vehicle state recognition method and device, electronic equipment and a storage medium.

It should be noted that the executing subject of the road vehicle state identification method provided by the embodiment of the present invention may be a road vehicle state identification device, and the road vehicle state identification device may be operated in an electronic device. The electronic device may be a plug-in the image capturing tool or the image processing tool, or a program independent from the image capturing tool or the image processing tool, which is not limited to this.

In a first aspect, an embodiment of the present invention provides a method for identifying a state of a road vehicle. Next, the road vehicle state recognition method will be described first.

Referring to fig. 1, a method for identifying a state of a road vehicle provided by an embodiment of the present invention is applied to an unmanned vehicle, and may include the following steps:

in step S1, a vehicle image is acquired.

The vehicle image is an image containing a target vehicle and shot by the image acquisition equipment.

The image acquisition device may include a camera, a video camera, a mobile phone, or a monitoring device on the road, etc. The acquired vehicle image contains at least one target vehicle. The target vehicle may be another vehicle that runs on the same road as the unmanned vehicle, or may be another vehicle that runs on a different road than the unmanned vehicle.

For example, this step may be performed by capturing an image of a vehicle passing by the vehicle by a monitoring device at the road gate and transmitting the captured image of the vehicle to the unmanned vehicle, which may implement the subsequent embodiments.

In the embodiment of the present invention, the required vehicle image size is 416 × 416 × 3.

Therefore, at this step, in one example, a vehicle image of 416 × 416 × 3 size can be directly obtained; in another example, an image of an arbitrary size may be obtained, and the obtained image is subjected to a certain size scaling process to obtain an image of a vehicle of 416 × 416 × 3 size.

It is understood that in the above two examples, the obtained image may be further subjected to image enhancement operations such as cropping, stitching, smoothing, filtering, edge filling, etc. to enhance the features of interest in the image and expand the generalization capability of the data set.

Step S2, extracting the features of the vehicle image by using the backbone network of the image recognition network to obtain x feature maps; the scales of the x characteristic graphs are sequentially increased; x is a natural number of 3 or more.

In this embodiment, the backbone network includes a plurality of densely connected modules and transition modules connected in series at intervals.

The backbone network in the present embodiment is improved based on the backbone network of the YOLOv3 network. The image recognition network of the embodiment of the invention is obtained by carrying out iterative training according to the vehicle image sample and the corresponding vehicle state.

In order to facilitate understanding of the network structure of the backbone network provided by the embodiment of the present invention, a description is given of the structure of the YOLOv3 network in the prior art. Fig. 2 is a schematic structural diagram of a YOLOv3 network in the prior art.

Referring to fig. 2, the portion within the dashed box is the YOLOv3 network. Wherein the part in the dotted line frame is a backbone (backbone) network of the YOLOv3 network, namely a darknet-53 network; the rest is a Feature Pyramid Network (FPN) network, which is divided into three prediction branches Y₁～Y₃Predicting branch Y₁～Y₃The scales of (2) are in one-to-one correspondence with the scales of the feature maps output by the 3 residual error modules res4, res8, res8 in the reverse direction of the input. The prediction results of the prediction branches are respectively represented by Y1, Y2 and Y3, and the scales of Y1, Y2 and Y3 are increased in sequence.

The backbone network of the YOLOv3 network is formed by connecting CBL modules and a plurality of resn modules in series. The CBL module is a Convolutional network module, and includes a conv layer (convolutive layer, convolutive layer for short), a BN (Batch Normalization) layer and a leakage relu layer corresponding to an activation function leakage relu, which are connected in series, and the CBL represents conv + BN + leakage relu. The resn module is a Residual module, n represents a natural number, and includes Res1, Res2, …, Res8, and the like, the resn module includes a zero padding (zero padding) layer, a CBL module, and a Residual unit group, which are connected in series, the Residual unit group is represented by Res unit n, meaning includes n Residual units Res unit, each Residual unit includes a plurality of CBL modules connected in a Residual Network (ResNets) connection form, and the feature fusion form adopts a parallel form, i.e., an add form.

Each prediction branch of the FPN network includes a convolutional network module group, specifically includes 5 convolutional network modules, that is, CBL × 5 in fig. 2. In addition, the US (up sampling) module is an up sampling module; concat represents that the feature fusion adopts a cascade mode, and concat is short for concatenate.

For the specific structure of each main module in the YOLOv3 network, please refer to the schematic diagram below the dashed box in fig. 2.

The difference between the backbone network of the image recognition network provided by the embodiment of the present invention and the backbone network of the YOLOv3 network in the prior art is as follows: the system comprises a plurality of densely connected modules and transition modules which are connected in series at intervals. By taking the connection mode of dense convolutional network DenseNet as reference, a specific dense connection module is proposed to replace a residual module (resn module) in the backbone network of yollov 3 network. It is known that ResNets combines features by summation before passing them to layers, i.e. feature fusion in a parallel manner. Whereas the dense connection approach connects all layers (with matching signature sizes) directly to each other in order to ensure that information flows to the maximum extent between layers in the network. Specifically, for each layer, all feature maps of its previous layer are used as its input, and its own feature map is used as its input for all subsequent layers, i.e., feature fusion is in a cascade (also referred to as a cascade). Therefore, compared with the YOLOv3 network using a residual error module, the image recognition network in the embodiment of the invention obtains more information quantity of the feature map by using the dense connection module instead, and can enhance feature propagation and improve detection precision during vehicle state detection. Meanwhile, because the redundant characteristic diagram does not need to be learned again, the number of parameters can be greatly reduced, the calculated amount is reduced, and the problem of gradient disappearance can be reduced.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an image recognition network according to an embodiment of the present invention. The backbone network of the present embodiment is described below with reference to fig. 3.

In this embodiment, the dense connection module is denoted as denm, and the dense connection module includes a convolutional network module and a dense connection unit group connected in series.

Because at least three prediction branches exist, the number of the dense connection modules is at least three, so that the feature maps output by the dense connection modules are correspondingly fused into the prediction branches. In fig. 3, the number of densely connected modules is 5 by way of example, and the accuracy of the image recognition network constituted by 5 densely connected modules is higher than that of 3 densely connected modules. In the backbone network of this embodiment, 3 densely-connected modules in the reverse direction along the input each output a feature map, the scales of the 3 feature maps are sequentially increased, and the feature maps and the predicted branch Y are respectively connected₁～Y₃And (7) corresponding.

In this embodiment, the dense connection module includes a convolutional network module (as denoted as CBL module as described above) connected in series and a dense connection unit group, and the dense connection unit group includes a plurality of dense connection units.

The dense connection unit group is denoted as den unit m, which means that the dense connection unit group includes m dense connection units, and m is a natural number greater than or equal to 4.

Each densely connected unit is denoted as den unit; the method comprises a plurality of convolution network modules which are connected in a dense connection mode, and a characteristic diagram output by the convolution network modules is fused in a cascading mode.

In this embodiment, the convolutional network module includes a convolutional layer, a BN layer, and a leakage layer, which are connected in sequence; the dense connection unit comprises a plurality of densely connected convolution network modules, and x characteristic graphs output by the convolution network modules are fused in a cascading mode.

The meaning of the cascade mode, namely concat, is tensor splicing, the operation is different from the operation of add in the residual module, concat can expand the dimensionality of the tensor, and add can not cause the change of the tensor dimensionality only by direct addition. Therefore, when feature extraction is performed in the backbone network in this embodiment, the dense connection module is used to change the feature fusion mode from parallel to serial, so that the early feature graph can be directly used as the input of each layer later, the transfer of features is enhanced, and the number of parameters and the amount of computation are reduced by multiplexing the feature graph parameters of the shallow network.

For a general dense convolutional network DenseNet structure, a transition layer between dense connections is included to adjust the feature map of the dense connections. Therefore, in the embodiment of the invention, transition modules can be arranged among the added densely-connected modules.

In one example, the transition module is a convolutional network module. I.e. using the CBL module as a transition module. Then, when a backbone network of the image recognition network is built, the residual module is only required to be replaced by the dense connection module, and then the dense connection module and the original CBL module are connected in series to obtain the image recognition network. Therefore, the network building process is quicker, and the obtained network structure is simpler. However, such a transition module only uses convolution layers for transition, that is, the dimension of the feature map is reduced by directly increasing the step size, and this only takes care of the features in the local region, but cannot combine the information of the whole feature map, so that the information in the feature map is lost more.

Preferably, in this embodiment, the transition module includes a plurality of convolutional network modules and a max pooling layer, which are connected in sequence, and an output of the convolutional network module and an output of the max pooling layer are connected in cascade.

Fig. 4 is a schematic structural diagram of a transition module according to an embodiment of the present invention. In this embodiment, the transition module is represented by a tran module, and the MP layer is a max pooling layer (Maxpool, abbreviated MP, meaning max pooling). Further, the step size of the MP layer may be selected to be 2. In this embodiment, the introduced MP layer can perform dimension reduction on the feature map with a larger receptive field; the used parameters are less, so that the calculated amount is not increased too much, the possibility of overfitting can be weakened, and the generalization capability of the network model is improved; and the original CBL module is combined, so that the characteristic diagram can be viewed as being subjected to dimension reduction from different receptive fields, and more information can be reserved.

In this embodiment, the number of the convolutional network modules included in the transition module is two or three, and a serial connection manner is adopted between the convolutional network modules. Compared with the method using one convolution network module, the method using two or three convolution network modules connected in series can increase the complexity of the model and fully extract the features.

The YOLOv3 network in the prior art contains many convolutional layers because there are 80 types of targets. In the embodiment of the invention, the number of the types of the vehicles is less, so that a large number of convolution layers are unnecessary, network resources are wasted, and the processing speed is reduced.

In the image recognition network of the present embodiment, the value of m for the dense connection module may be 4, and it can be understood that, compared with the number of convolutional layers included in a plurality of residual modules of the backbone network in the YOLOv3 network in the prior art, in the image recognition network of the present embodiment, by setting the number of dense connection units included in the dense connection module to 4, the number of convolutional layers in the backbone network may be reduced for the vehicle image of the embodiment of the present invention without affecting the network accuracy.

And step S3, performing top-down feature fusion of the x feature maps in a dense connection mode by using the FPN network of the image recognition network to obtain the vehicle state of the vehicle image, wherein the vehicle state comprises the position and the category of each vehicle.

In the embodiment of the invention, the original FPN is improved, the network connection mode of the FPN is changed, and the characteristic fusion mode is changed to obtain the improved FPN, the main point is that the characteristic fusion is combined with the transverse mode and the top-down intensive connection mode, and in the mode, the top-down mode is changed into the mode that the characteristic diagram of the prediction branch with smaller scale directly transmits the characteristic of each prediction branch with larger scale.

The FPN network and the feature fusion method thereof in the embodiment of the present invention are described below.

Examples of the inventionComprises x prediction branches Y corresponding to the dimensions of x feature maps₁～Y_x。

In this embodiment, the top-down and dense connection form feature fusion is performed on x feature maps by using an FPN network of an image recognition network, specifically:

for predicted branch Y_iObtaining a feature map with a corresponding scale from the x feature maps, performing convolution processing, and comparing the feature map after convolution processing with the prediction branch Y_i-1～Y₁Performing cascade fusion on the feature maps subjected to the upsampling treatment respectively; wherein branch Y is predicted_i-jHas an upsampling multiple of 2^j(ii) a i is 2, 3, …, x; j is a natural number smaller than i.

I is 3, i.e. the predicted branch Y₃For illustration, the feature maps for performing the cascade fusion process are derived from three aspects: on the first hand, from 3 feature maps, a feature map with a corresponding scale is obtained and is subjected to convolution processing, that is, the feature map output by a third dense connection module along the input reverse direction is subjected to CBL module, and the feature map can also be understood as being subjected to 1-time upsampling and has a size of 52 × 52 × 255; the second aspect derives from predicting branch Y₂(i.e. Y)_i-1＝Y₂) I.e. the characteristic map (size 26 x 255) output by the second densely-connected module in the reverse direction along the input passes through the prediction branch Y₂The CBL module of (2)¹2 times of the feature map after the upsampling processing (the size is 52 × 52 × 255); the third aspect derives from the predicted branch Y₁(i.e. Y)_i-2＝Y₁) I.e. the characteristic map (size 13 x 255) output by the first densely-connected module in the reverse direction along the input is predicted for branch Y₁The CBL module of (2) is then passed²4 times of the feature map after the upsampling processing (the size is 52 × 52 × 255); then, as will be understood by those skilled in the art, after the above-mentioned process performs upsampling processing on 3 feature maps with different scales output by the backbone network by different multiples, the sizes of the 3 feature maps to be cascaded and fused can be made to be consistent, and all the sizes are 52 × 52 × 255. Thus, branch Y is predicted₃After cascade fusion, the convolution and other processing can be continuously carried out to obtain a prediction nodeThe fruit Y3, Y3 size was 52 × 52 × 255.

About predicted branch Y₂See the prediction branch Y₃And will not be described herein. For the predicted branch Y₁And the subsequent prediction process is carried out by the first intensive connection module after the characteristic diagram output by the first intensive connection module along the input reverse direction is obtained, and the characteristic diagrams of other prediction branches are not received to be fused with the characteristic diagrams.

In the feature fusion method of the YOLOv3 network in the prior art, a method of adding deep layer and shallow layer network features and then performing upsampling is used, and after the features are added, a feature map is extracted through a convolutional layer, so that some original feature information is damaged. In the present embodiment, the feature fusion method is changed to a dense fusion method, that is, deep features are directly upsampled by different multiples, so that all feature maps transmitted have the same size. The feature maps and the shallow feature map are fused in a serial connection mode, features are extracted again from the fusion result to eliminate noise in the feature maps, main information is reserved, and then prediction is carried out, so that more original information can be utilized, and high-dimensional semantic information participates in a shallow network. Therefore, the advantage that the dense connection network reserves more original semantic features of the feature map can be exerted, but for a top-down method, the reserved original semantic is higher-dimensional semantic information, so that the object classification is facilitated. By directly receiving the characteristics of the shallower layer network, more specific characteristics can be obtained, so that the loss of the characteristics can be effectively reduced, the parameter quantity needing operation can be reduced, and the prediction process is accelerated.

In the above, the feature fusion method is mainly introduced, and after feature fusion, each prediction branch is mainly predicted by using some convolution operations to obtain respective prediction results. For how to obtain the respective prediction results, please refer to the related prior art, which is not described here.

And then, classifying all the prediction results through a classification network, and then performing prediction frame deduplication processing through a non-maximum suppression module to obtain the vehicle state of the vehicle image.

Wherein the classification network comprises a SoftMax classifier. The purpose is to realize the mutually exclusive classification of a plurality of vehicle states. Optionally, the classification network may also perform classification along a logistic regression using the YOLOv3 network to achieve multiple independent two classifications.

The non-maximum suppression module is configured to perform NMS (non _ max _ suppression) processing. The method is used for repeatedly selecting a plurality of prediction boxes of the same target, and the prediction boxes with relatively low confidence coefficient are excluded.

For the content of the classification network and the non-maximum suppression module, reference may be made to the related description of the prior art, and details thereof are not repeated here.

The vehicle state includes the location and category of each vehicle. Specifically, the vehicle state is in the form of a vector, and includes the position of the prediction box, the confidence of the vehicle in the prediction box, and the category of the vehicle in the prediction box. The position of the prediction frame is used for representing the position of the target vehicle in the vehicle image; specifically, the position of each prediction frame is represented by four values, bx, by, bw, and bh, where bx and by are used to represent the center point position of the prediction frame, and bw and bh are used to represent the width and height of the prediction frame.

The types of vehicles include cars, single-deck buses, double-deck buses, trucks, vans, bicycles, motorcycles, and the like.

Further, the vehicle state may also include a running state of the target vehicle, such as a current speed, normal running, speeding, or a temporary stop of the vehicle.

And the vehicle state may also carry the shooting time and shooting position of the vehicle image.

In step S4, a driving control command is generated based on the vehicle state.

In the present embodiment, the driving control instruction includes route information on vehicle travel and a speed value of vehicle travel.

It will be appreciated that the driving control instructions for the unmanned vehicle may be generated based on the determined vehicle state, and that an alternative embodiment may include:

step (1): acquiring the current time, the current position of the unmanned automobile and the current running speed of the unmanned automobile;

step (2): acquiring the shooting time and the shooting position of a vehicle image in the vehicle state and the current running speed of a target vehicle;

and (3): determining the target distance between the target vehicle and the unmanned vehicle by using the information acquired in the step (1) and the step (2); and determining the lane position and the vehicle type of the target vehicle according to the detected position and the detected category of the vehicle.

And (4): and determining the safe driving distance range between the unmanned automobile and the target vehicle and the expected driving speed value of the unmanned automobile by using the target distance, the position of the lane where the target vehicle is located and the type of the vehicle, obtaining the driving route and the driving speed of the unmanned automobile, and generating the reminding information.

By planning the driving route of the unmanned automobile, the purpose of avoiding the congested area and the obstacle vehicles can be achieved.

Hereinafter, the training process of the image recognition network will be briefly described. As will be understood by those skilled in the art, before network training, a network structure as shown in fig. 3 needs to be constructed, and the image recognition network is obtained by performing iterative training according to vehicle image samples and corresponding vehicle states. The network training process can be divided into the following steps:

step 1, obtaining a plurality of vehicle image samples and vehicle states corresponding to the vehicle image samples. The vehicle state may include the location and category of the respective vehicle.

In this process, the vehicle state of each vehicle image sample is known, and the manner of determining the vehicle state of each vehicle image sample may be: by manual recognition, or by other image recognition tools, and the like. And then, the vehicle image sample needs to be marked, an artificial marking mode can be adopted, and the non-artificial marking can be carried out by using other artificial intelligence methods.

In addition, the network training needs to use data in a VOC format or a COCO format, and the marked data is stored in a text document. A Python script is required to perform the conversion of the data set markup format.

And 2, training the constructed network by using each vehicle image sample and the vehicle state corresponding to each vehicle image sample to obtain the trained image recognition network. Specifically, the method comprises the following steps:

(a) taking the vehicle state corresponding to each vehicle image sample as a true value corresponding to the vehicle image sample, and training each vehicle image sample and the corresponding true value through a constructed network to obtain a training result of each vehicle image sample;

(b) comparing the training result of each vehicle image sample with the true value corresponding to the vehicle image sample to obtain an output result corresponding to the vehicle image sample;

(c) calculating a loss value of the network according to the output result corresponding to each vehicle image sample;

(d) and (c) adjusting parameters of the network according to the loss value, and repeating the steps (a) - (c) until the loss value of the network reaches a certain convergence condition, namely the loss value reaches the minimum value, which means that the training result of each vehicle image sample is consistent with the true value corresponding to the vehicle image sample, thereby completing the training of the network and obtaining the image recognition network after the iterative training.

In the method for identifying the state of the road vehicle provided by the embodiment, the residual modules in the backbone network of the YOLOv3 network in the prior art are replaced by the dense connection modules. When the features are extracted, the original parallel feature fusion mode is changed into a serial mode by using a dense connection module, and the feature graph obtained in the early stage is used as the input of each layer of feature graph in the later stage, so that the feature graph with more information content can be obtained, the feature transfer is strengthened, and the detection precision is improved; the extracted feature graph is subjected to top-down dense connection type feature fusion, namely, deep features are subjected to up-sampling of different multiples and are fused with shallow features in series, more original information can be obtained, high-dimensional semantic information participates in a shallow network, the detection precision is improved, more specific features can be obtained by directly receiving the features of the shallow network, the feature loss is effectively reduced, and prediction can be accelerated by reducing the parameter amount of operation. On the basis, the vehicle state can be acquired by detecting the vehicle image, the driving control instruction is generated, the driving state of the unmanned automobile is further controlled, and the safe driving requirement of the unmanned technology is met.

In a second aspect, referring to fig. 5, an embodiment of the present invention further provides a road vehicle state identification apparatus, including:

the image acquisition module 501 is used for acquiring a vehicle image;

the image detection module 502 is configured to perform feature extraction on a vehicle image by using a backbone network of an image recognition network to obtain x feature maps; the scales of the x characteristic graphs are sequentially increased; x is a natural number of 3 or more;

the image identification module 503 is configured to perform top-down and dense connection type feature fusion on the x feature maps by using an FPN network of the image identification network to obtain a vehicle state of the vehicle image, where the vehicle state includes a position and a category of each vehicle;

a vehicle control module 504 for generating driving control instructions based on vehicle state;

the backbone network comprises a plurality of dense connection modules and transition modules which are connected in series at intervals; the image recognition network is obtained by performing iterative training according to the vehicle image sample and the corresponding vehicle state.

For details, please refer to the contents of the road vehicle state identification method of the first aspect, which will not be described herein again.

According to the road vehicle identification device provided by the embodiment of the invention, a residual error module in a main network of a YOLOv3 network in the prior art is replaced by a dense connection module. When the features are extracted, the original parallel feature fusion mode is changed into a serial mode by using a dense connection module, and the feature graph obtained in the early stage is used as the input of each layer of feature graph in the later stage, so that the feature graph with more information content can be obtained, the feature transfer is strengthened, and the detection precision is improved; the extracted feature graph is subjected to top-down dense connection type feature fusion, namely, deep features are subjected to up-sampling of different multiples and are fused with shallow features in series, more original information can be obtained, high-dimensional semantic information participates in a shallow network, the detection precision is improved, more specific features can be obtained by directly receiving the features of the shallow network, the feature loss is effectively reduced, and prediction can be accelerated by reducing the parameter amount of operation. By detecting the vehicle image, the vehicle state can be acquired, the driving control instruction is further generated, the driving state of the unmanned vehicle is further controlled, and the safe driving requirement of the unmanned technology is met.

In a third aspect, referring to fig. 6, an embodiment of the present invention further provides an electronic device, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete communication with each other through the communication bus 604;

a memory 603 for storing a computer program;

the processor 601 is configured to implement the steps of any one of the above-described road vehicle state identification methods when executing the program stored in the memory 603.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

Through above-mentioned electronic equipment, can realize: the residual modules in the backbone network of the prior art YOLOv3 network are replaced by densely connected modules. When the features are extracted, the original parallel feature fusion mode is changed into a serial mode by using a dense connection module, and the feature graph obtained in the early stage is used as the input of each layer of feature graph in the later stage, so that the feature graph with more information content can be obtained, the feature transfer is strengthened, and the detection precision is improved; the extracted feature graph is subjected to top-down dense connection type feature fusion, namely, deep features are subjected to up-sampling of different multiples and are fused with shallow features in series, more original information can be obtained, high-dimensional semantic information participates in a shallow network, the detection precision is improved, more specific features can be obtained by directly receiving the features of the shallow network, the feature loss is effectively reduced, and prediction can be accelerated by reducing the parameter amount of operation. On the basis, the vehicle state can be acquired by detecting the vehicle image, the driving control instruction is generated, the driving state of the unmanned automobile is further controlled, and the safe driving requirement of the unmanned technology is met.

In a fourth aspect, corresponding to the method for recognizing a state of a road vehicle provided in the foregoing embodiments, the embodiments of the present invention further provide a computer-readable storage medium, in which a computer program is stored, and the computer program implements the foregoing method steps when being executed by a processor.

The above-described computer-readable storage medium stores an application program that executes the road vehicle state identification method provided by the embodiment of the present invention when executed, and thus can realize: the residual modules in the backbone network of the prior art YOLOv3 network are replaced by densely connected modules. When the features are extracted, the original parallel feature fusion mode is changed into a serial mode by using a dense connection module, and the feature graph obtained in the early stage is used as the input of each layer of feature graph in the later stage, so that the feature graph with more information content can be obtained, the feature transfer is strengthened, and the detection precision is improved; the extracted feature graph is subjected to top-down dense connection type feature fusion, namely, deep features are subjected to up-sampling of different multiples and are fused with shallow features in series, more original information can be obtained, high-dimensional semantic information participates in a shallow network, the detection precision is improved, more specific features can be obtained by directly receiving the features of the shallow network, the feature loss is effectively reduced, and prediction can be accelerated by reducing the parameter amount of operation. On the basis, the vehicle state can be acquired by detecting the vehicle image, the driving control instruction is generated, the driving state of the unmanned automobile is further controlled, and the safe driving requirement of the unmanned technology is met.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

For the embodiments of the electronic device and the computer-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments of the methods, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A road vehicle state recognition method is applied to an unmanned automobile and comprises the following steps:

acquiring a vehicle image;

generating a driving control instruction based on the vehicle state;

2. The method according to claim 1, wherein the dense connection modules comprise serially connected convolutional network modules and dense connection unit groups, the number of the dense connection modules is at least three, and the dense connection unit groups comprise a plurality of dense connection units.

3. The road vehicle state identification method according to claim 2, wherein the convolutional network module comprises a convolutional layer, a BN layer and a Leaky relu layer which are connected in sequence; the dense connection unit comprises a plurality of densely connected convolution network modules, and combines the x characteristic graphs output by the convolution network modules in a cascading mode.

4. The method according to claim 3, wherein the transition module comprises a plurality of convolution network modules and a maximum pooling layer which are connected in sequence, and the output of the convolution network module and the output of the maximum pooling layer are connected in cascade.

5. The method according to claim 1, wherein the FPN network includes x prediction branches Y corresponding to scales of the x feature maps₁～Y_x。

6. The method for recognizing the state of a road vehicle as claimed in claim 5, wherein the step of performing top-down dense connection type feature fusion on the x feature maps by using an FPN (field programmable gate array) network of the image recognition network comprises the following steps:

7. The method according to claim 1, wherein the driving control instruction includes route information on vehicle travel and a speed value of vehicle travel.

8. A road vehicle state recognition device, comprising:

the image acquisition module is used for acquiring a vehicle image;

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.