CN117315604A

CN117315604A - Vehicle vision detection method and device based on deep learning and storage medium

Info

Publication number: CN117315604A
Application number: CN202311185082.7A
Authority: CN
Inventors: 朱伟枝; 张敏; 于兆勤
Original assignee: Guangdong Polytechnic College
Current assignee: Guangdong Polytechnic College
Priority date: 2023-09-13
Filing date: 2023-09-13
Publication date: 2023-12-29

Abstract

The invention provides a vehicle vision detection method, device and storage medium based on deep learning, wherein the method comprises the steps of obtaining a vehicle scene picture to be identified; inputting the vehicle scene picture to be identified into a preset vehicle vision detection model to generate an identification result; the recognition results comprise a parking space recognition result, a tire recognition result and a lane recognition result; correcting the vehicle posture or planning a path according to the identification result; according to the method, the parking space, the tire and the lane are respectively identified through the preset visual detection model for the vehicle, corresponding results are generated, and the driving strategy is assisted to be planned by combining the identification results; different vehicle scenes can be identified rapidly and accurately, and driving strategies can be formulated reasonably and effectively according to different vehicle scene identification results, so that the energy efficiency of road traffic safety and vehicle management is improved.

Description

Vehicle vision detection method and device based on deep learning and storage medium

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a vehicle vision detection method and device based on deep learning and a storage medium.

Background

As the speed of urban development increases, the number of vehicles in the city has also increased explosively. However, there are a number of difficulties with road traffic management: firstly, a lot of time is required for searching the parking space, and even if a lot of parking space management systems are present, the sensor technology and the image processing technology are adopted in most cases, so that the cost is high, and the recognition precision cannot be ensured; secondly, at present, obstacles, vehicle running speed and the like are usually identified, and the identification technology for the parking state of the vehicle or the road surface flatness is less, so that the influence of tire position information and a road pit on vehicle management cannot be known, and the energy efficiency of road safety management is not strong; third, in the case of conventional vehicle scene recognition, local recognition is generally performed, and for example, detection is performed on certain information such as a road or a vehicle speed, and in this case, deviation is likely to occur when assisting in making a driving strategy because the information is relatively single.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a vehicle vision detection method, a device and a storage medium based on deep learning, which are used for solving the problems that the vehicle scene recognition is inaccurate and deviation is easy to occur in the process of preparing a driving strategy in the prior art.

In a first aspect, the present invention provides a vehicle vision detection method based on deep learning, the method comprising:

acquiring a vehicle scene picture to be identified;

inputting the vehicle scene picture to be identified into a preset vehicle vision detection model to generate an identification result; the recognition results comprise a parking space recognition result, a tire recognition result and a lane recognition result;

and correcting the vehicle posture or planning a path according to the identification result.

In current vehicle management processes, the recognition of the vehicle scene is generally based only on local recognition, for example, individual recognition of the parking position of the vehicle, the running speed of the vehicle or obstacles, etc.; on the one hand, the influence of tires and road pits on the running or parking of the vehicle is not considered, and the identification result cannot be obtained quickly and accurately; on the other hand, isolated scene recognition often also fails to effectively assist driving strategies. In the aspect, firstly, a vehicle scene picture to be identified is obtained, then, the parking space, the tire and the lane are respectively identified through a preset visual detection model for the vehicle, corresponding results are generated, and finally, the identification results are combined to assist in planning a driving strategy; different vehicle scenes can be accurately identified through the trained visual detection model for the vehicle, and driving strategies can be reasonably and effectively formulated according to different vehicle scene identification results, so that the energy efficiency of road traffic safety and vehicle management is improved.

In one possible implementation manner, the vehicle vision detection model comprises a parking space recognition model, a tire recognition model and a lane recognition model; wherein,

the parking space recognition model and the tire recognition model are obtained based on the training of a Yolov3 algorithm;

the lane recognition model is obtained based on the training of the Yolov5 algorithm.

In this possible implementation, different recognition models are trained using different algorithms. The parking spaces and the tires are relatively well marked and identified relative to the lane pits, so that a Yolov3 algorithm is adopted. The vehicle position recognition model is trained and generated by using the Yolov3 algorithm, and compared with the vehicle position recognition by using the existing fuzzy C average cluster classification system and the circulating neural network system, the vehicle position recognition efficiency and recognition accuracy are improved; the tire identification model is generated by training the Yolov3 algorithm, so that the position of the tire can be accurately identified, and the problems of traffic jam, occupation of a plurality of public parking spaces, vehicle body scratch and the like caused by nonstandard parking due to vehicle body deviation are avoided. The efficiency of traffic management order is greatly improved, and the utilization rate of public resources is improved. The traffic lane recognition model is trained and generated by using the Yolov5 algorithm, so that the calculated amount can be further reduced, the condition of a road pit can be accurately recognized, a driver is assisted in reasonably planning a driving path, and the safety of road traffic is improved.

In one possible implementation manner, before the inputting the to-be-identified vehicle scene picture into a preset vehicle vision detection model, the method further includes:

training the parking space recognition model, including:

acquiring a first preset number of parking space scene pictures;

marking the idle parking spaces, the occupied parking spaces and the parked vehicles in the parking space scene pictures respectively to generate a first training set; wherein the parked vehicle includes at least two parking directions;

based on a Darknet framework, the Yolov3 algorithm is trained by using the first training set, and the parking space recognition model is generated.

In this kind of mode that probably implements, adopted different parking stall scene pictures, in order to the user state of the parking stall that can accurately discern, all annotated parking stall and vehicle when annotating, and with the parking stall for idle parking stall, occupy the parking stall. In order to enhance the recognition capability of the model, the parked vehicle further comprises at least two parking directions, so that different parking space scene pictures can be recognized more comprehensively under the training set. Further, the Yolov3 algorithm is trained based on the Darknet framework, so that the parking space recognition model can be obtained through faster training.

training the tire identification model, comprising:

acquiring a second preset number of parking vehicle pictures;

labeling tires in the parking vehicle pictures to generate a second training set; wherein, the tire and the vehicle body form at least two included angles with different sizes;

and training a Yolov3 algorithm by using the second training set based on a Darknet frame to generate the tire identification model.

In the possible implementation mode, in order to accurately identify the position of the tire, the included angle formed by the tire and the vehicle body is mainly marked, and the Yolov3 algorithm is used for learning, so that the tire identification model can accurately identify the position of the tire, and whether the vehicle body deviates when the vehicle is parked is judged; similarly, the Yolov3 algorithm is trained based on the lightweight dark frame in the embodiment, so that the training speed is improved.

Training the lane recognition model, comprising:

obtaining lane scene pictures of a third preset number;

marking the road pits in the lane scene picture to generate a third training set;

training a Yolov5 algorithm by using the third training set to generate the lane recognition model; the Yolov5 algorithm comprises a CSP-Darknet feature network.

The lane pit is more difficult to identify compared with the vehicle tire or the vehicle tire, because the vehicle tire and the vehicle tire have obvious morphological characteristics, and the coordinate information of the vehicle tire and the vehicle tire is determined only by using a marking frame during marking; however, the road pit is easily affected by weather, background, light, etc., so as to change the texture and shape thereof, thus having great difficulty in recognition. In this possible implementation, to accurately identify the road pit, the Yolov5 algorithm is mainly used. The Yolov5 algorithm can further improve the detection performance of the model by changing the loss function, the cross-correlation method and the detection module structure. By employing CSP-Darknet, the amount of computation can be reduced and accuracy maintained.

In a possible implementation manner, the training the Yolov5 algorithm by using the third training set includes:

training the Yolov5 algorithm by adopting mosaic training based on the third training set; the mosaic training is to scale and rotate a plurality of randomly acquired marked lane scene pictures, and form new lane scene pictures through piecing so as to train the Yolov5 algorithm.

In the possible implementation mode, four or more pictures are pieced into one picture by zooming, overturning and the like after the mosaic training is carried out for reading a plurality of pictures each time, the generalization of the network can be stronger through the training, and the target can be accurately identified under different conditions.

In a possible implementation manner, the training the Yolov5 algorithm using the third training set further includes:

training the Yolov5 algorithm with self-countermeasure training based on the third training set; the self-countermeasure training includes a first stage and a second stage;

the first stage is used for changing the size of the marked lane scene picture;

the second stage is used for training according to the lane scene pictures with the modified sizes.

In this possible implementation, the robustness of the network can be increased by the training of the self-attack countermeasure, namely, the target can be still identified under the conditions of blurring, incomplete target and the like on the picture.

In one possible implementation manner, the correcting the vehicle posture or performing path planning according to the recognition result includes:

determining the degree of an included angle formed by the tire and the vehicle body of the parked vehicle according to the tire identification result, and triggering a prompt to correct the current vehicle posture when the degree of the included angle is judged to be larger than a first threshold value; or alternatively, the first and second heat exchangers may be,

Determining the number and positions of the lane pits and the idle parking spaces according to the parking space recognition result and the lane recognition result; and re-planning the vehicle driving path according to the number and the positions of the lane pits and the idle parking spaces.

In this possible embodiment, the adjustment of the driving strategy is assisted mainly on the basis of various recognition results. Wherein, whether the current parking posture of the vehicle is reasonable can be judged according to the identification result of the tire, when the angle between the tire and the vehicle body is overlarge, the posture of the vehicle can be corrected, the vehicle body and the tires are ensured to be in the range of the parking space, and the scratch or other safety problems are avoided; according to the using state of the parking space and the leveling condition of the road surface of the lane, the path to be driven can be determined, including how to bypass the road pit and how to drive to the most convenient parking space. Therefore, various recognition results are integrated, and compared with a single recognition result, the driving strategy can be formulated more accurately, and the energy efficiency of road management and the road traffic safety are improved.

In a second aspect, the present invention further provides a vehicle vision detection device based on deep learning, where the device includes:

the image acquisition unit is used for acquiring a vehicle scene picture to be identified;

The recognition unit is used for inputting the vehicle scene picture to be recognized into a preset visual detection model for the vehicle and generating a recognition result; the recognition results comprise a parking space recognition result, a tire recognition result and a lane recognition result;

and the planning unit is used for correcting the vehicle posture or planning the path according to the identification result.

In this aspect, the specific implementation of each module may also correspond to the corresponding description of the method embodiments shown in the foregoing embodiments, which is not repeated herein for simplicity.

In a third aspect, the present invention also provides a computer storage medium storing at least one program executable by a computer, which when executed by the computer, causes the computer to perform the steps in the deep learning-based vehicular visual inspection method described in any one of the above.

In this embodiment, the advantages and beneficial effects of the vehicle vision detection method based on deep learning are described above, and are not described herein, and since the vehicle vision detection method based on deep learning is executed through the computer storage medium, the computer storage medium has the same advantages and beneficial effects.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a vehicle vision detection method based on deep learning according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a Yolo algorithm according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a convolutional neural network involved in a Yolo algorithm according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a training flow of a parking space recognition model according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a parking space image labeling according to an embodiment of the present invention;

FIG. 6 is a diagram of a training set of tire identification models according to one embodiment of the present invention;

fig. 7 is a schematic diagram of a network structure of Yolov5 according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a CSP-DenseNet network according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a vehicle vision detection device based on deep learning according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, if a directional indication (such as up, down, left, right, front, and rear … …) is involved in the embodiment of the present invention, the directional indication is merely used to explain the relative positional relationship, movement condition, etc. between the components in a specific posture, and if the specific posture is changed, the directional indication is correspondingly changed.

In addition, if there is a description of "first", "second", etc. in the embodiments of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, if "and/or" and/or "are used throughout, the meaning includes three parallel schemes, for example," a and/or B "including a scheme, or B scheme, or a scheme where a and B are satisfied simultaneously. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.

Conventionally, when a driving strategy is to be determined, it is generally determined only by a certain vehicle state, for example, an obstacle, a traveling speed, or the like, and the accuracy of recognition cannot be ensured. However, in the actual driving strategy formulation and road management, the tire position, the parking space state and the road condition all have influence on the management and strategy result. In order to improve the recognition accuracy and efficiency of a vehicle scene and assist in driving strategy formulation so as to improve the road management energy efficiency, the invention provides a vehicle vision detection method based on deep learning.

Referring to fig. 1, one embodiment of the present invention provides a vehicle vision detection method based on deep learning, including:

s10, acquiring a vehicle scene picture to be identified;

s20, inputting a vehicle scene picture to be identified into a preset vehicle vision detection model, and generating an identification result; the recognition results comprise a parking space recognition result, a tire recognition result and a lane recognition result;

And S30, correcting the vehicle posture or planning a path according to the recognition result.

In current vehicle management processes, the recognition of the vehicle scene is generally based only on local recognition, for example, individual recognition of the parking position of the vehicle, the running speed of the vehicle or obstacles, etc.; on the one hand, the influence of tires and road pits on the running or parking of the vehicle is not considered, and the identification result cannot be obtained quickly and accurately; on the other hand, isolated scene recognition often also fails to effectively assist driving strategies.

In step S10, a vehicle scene picture to be identified is first acquired. In general, a vehicle scene video may be acquired first, and then processed and converted into a frame-by-frame vehicle scene picture. In particular, the vehicle scene image should include at least road images of the vehicle body and the surroundings of the vehicle, while the sky, trees or pedestrians belong to background interferents.

In step S20, a vehicle scene picture to be identified is input to a preset vehicle vision detection model, and an identification result is generated. The preset visual detection model for the vehicle is mainly obtained through training according to a deep learning algorithm.

Further, the recognition results include a parking space recognition result, a tire recognition result, and a lane recognition result. The parking space recognition result mainly comprises the number of recognized parking spaces and the positions of the parking spaces, the tire recognition result mainly comprises the size of an included angle formed by the tire and the vehicle body, and the lane recognition result mainly comprises the recognition of the number and the positions of the pits on the lane pavement.

Finally, in step S30, the parking space posture may be corrected or the path planning may be performed according to the parking space recognition result, the tire recognition result, and the lane recognition result.

In the embodiment, firstly, a vehicle scene picture to be identified is obtained, then, a parking space, a tire and a lane are respectively identified through a preset visual detection model for the vehicle, corresponding results are generated, and finally, the identification results are combined to assist in planning a driving strategy; compared with the existing recognition system, different vehicle scenes can be recognized rapidly and accurately through the trained vehicle vision detection model; and moreover, different vehicle scene recognition results are comprehensively considered, so that a driving strategy can be reasonably and effectively formulated, and the energy efficiency of road traffic safety and vehicle management is improved.

In one embodiment, the vehicle vision detection model includes a parking space recognition model, a tire recognition model, and a lane recognition model; wherein,

the lane recognition model is obtained based on the Yolov5 algorithm training.

For the problem that a detected object is detected at a relatively short distance and a detected object may be detected at a different distance, a number of detection algorithms have been widely proposed, one of which is Yolov 3. The Yolov3 network has good stability in near targets and smaller targets and solves this problem to some extent. Yolov3 networks based on the Yolov2 model, some problems of heterogeneous models have been modified, using 3 x 3 and 1 x1 convolutional layers; referring to the rest network, the new network can directly form quick connection, and the formed connection number network can go to 53 layers; and detecting different scales in the features by using the features of the multiple scales, and adding the different scales to detect the object; and the Softmax classifier in the Yolov2 model was replaced with a logical classifier that supported multiple tag objects.

Referring to fig. 2, fig. 2 provides the principle of the Yolo algorithm. Yolo redefines the detection problem in an object for a class as a problem for a regression class. The method comprises the steps of applying a single convolutional image neural network partition to an entire convolutional image, dividing each image grid partition into a plurality of grids, and carrying out numerical prediction on image type frame probability and image boundary frame probability in each image grid partition. For example, a 100x100 rectangular image, its spatial division is unified into multiple grids, e.g., 7x7. Then, for each rectangular grid, the network will accurately describe each grid bounding box using four grid descriptor forms, respectively, 1) the center of the bounding box; 2) Height of the steel plate; 3) A width; 4) The value maps to the class to which the object belongs. In addition, the prediction algorithm may even be used to predict the probability of movement of the presence of an object in a bounding box. If the data center location of an object falls within a grid control unit, the grid control unit system is responsible for automatically detecting the object. The grid of points may include a plurality of point boundaries. In doing this training, it is often desirable to have only one unique object bounding box between each user object. Thus, the distribution is made according to the two overlapping functions of the functions, and gradient prediction is made with a given object variable. Finally, yolo filters bounding boxes that are less than a threshold in a manner that would refer to objects of each class as non-maximal suppression, which provides the experiment with a predicted image. There is a corresponding probability for predicting the bounding box and each type of object.

Referring to fig. 3, fig. 3 provides a schematic diagram of the convolutional neural network involved in the Yolo algorithm.

Convolution layer: the convolution layer performs the convolution operation to perform the dimension reduction and extraction on the input photo. The calculation function of the convolution method is a linear convolution calculation, and the function needing to activate the check kernel in the neural network is a nonlinear connection function, so that the function structure is similar to that of the previous full-linear connection, an experimenter only needs to add an activated function to the function, and a Sigmoid, tanh, reLU function is a more commonly used function.

Pooling layer: the pooling layer does not change the depth of the three-dimensional matrix, but it may shrink the matrix. The pooling layer mimics the human visual nervous system, converts high resolution pictures into low resolution pictures, and avoids too high dimensions, resulting in excessive time-consuming fits. By pooling the layer, the number of nodes in the last fully connected layer can be reduced. The dimension reduction can be achieved, and the operation speed and efficiency are improved; the probability of overfitting is reduced, and the occurrence of overfitting is avoided as much as possible; the sensitivity to picture panning and rotation is reduced.

Full tie layer: after processing through the multi-stage network, the data comes to the fully connected layer. And the final classification result will be given by the last 1 to 2 fully connected layers of the convolutional neural network. After several rounds of processing, the information in the image can be considered to have been characterized by advanced information. We can consider that the main job in the previous network is a process of automatically extracting image features. The final sorting work still requires a full connectivity layer to complete. But the fully connected layer has no self-learning capability and only relies on higher-order information transmitted by an upper network structure to complete learning.

Output layer: the output layer is mainly used for classifying problems. Typically at the bottom of the neural convolutional network. The function is used to perform the sort label job. The output layer may be structured as a core of objects, size and classification of the relevant directions. And accurately reproduces the grouping result of each pixel in the photo segmentation class.

In general, the Yolo algorithm has the following advantages over other deep learning algorithms:

the recognition speed is high because a regression method is used and the frame construction is simple.

The image features which can be identified by predicting based on the whole picture information have stronger compatibility. And other translational detection frames can only predict based on local picture information.

In this embodiment, different recognition models are trained by different algorithms. The parking spaces and the tires are relatively well marked and identified relative to the lane pits, so that a Yolov3 algorithm is adopted.

The vehicle position recognition model is trained and generated by using the Yolov3 algorithm, and compared with the vehicle position recognition by using the existing fuzzy C average cluster classification system and the circulating neural network system, the vehicle position recognition efficiency and recognition accuracy are improved;

the tire identification model is generated by training the Yolov3 algorithm, so that the position of the tire can be accurately identified, and the problems of traffic jam, occupation of a plurality of public parking spaces, vehicle body scratch and the like caused by nonstandard parking due to vehicle body deviation are avoided. The efficiency of traffic management order is greatly improved, and the utilization rate of public resources is improved.

The traffic lane recognition model is trained and generated by using the Yolov5 algorithm, so that the calculated amount can be further reduced, the condition of a road pit can be accurately recognized, a driver is assisted in reasonably planning a driving path, and the safety of road traffic is improved.

In one embodiment, before inputting the vehicle scene picture to be identified into the preset vehicle vision detection model, the method further comprises:

training a parking space recognition model, comprising:

acquiring a first preset number of parking space scene pictures;

marking an idle parking space, an occupied parking space and parked vehicles in the parking space scene picture respectively, and generating a first training set; wherein the parked vehicle comprises at least two parking directions;

based on the Darknet framework, the Yolov3 algorithm is trained by using the first training set, and a parking space recognition model is generated.

Referring to fig. 4, fig. 4 provides a flow for training a parking space recognition model.

Firstly, selecting a basic operating system, wherein the basic operating system is preferably a ubuntu system, the ubuntu system is developed based on the extension of a linux operating system, and compared with a windows system, the ubuntu system is purer; the PC is mainly configured to: i7-7700HQ Intel processor, computer memory is 16GB, graphics card (GPU) is (GTX-1070), and display memory is 8G. The developed environment is Ubuntu18.04, and the configuration to be built is CUDA-10.02 (calling a video card in deep learning), openCV3.4.5 (instant display operation result) and a Yolov3 network detection model built on a Darknet framework.

And selecting a labeling object, making a data set according to the labeling picture, training a Yolov3 algorithm by using the data set, judging the recognition accuracy of the model according to the training result, and re-adjusting the data to retrain if the recognition accuracy does not meet the requirement until the parking space recognition model can meet the recognition accuracy.

In this embodiment, the advantages of selecting and constructing the dark frame include:

1) The program is convenient to install: the loading content required by the experimenters can be selected from the makefile folder, and direct compiling can be performed. The time spent is less;

2) No load item is required: the whole framework is written by the C language, other libraries are not needed, and an author of opencv writes a replaceable function for the framework;

3) The structure of the frame is clear, the source code can be conveniently modified and checked, the functions used for definition detection and classification can be seen in an example folder, and the basic files of the frame are all in an src folder;

4) Friendly python interface: even though the design of the framework is derived from the C language, a python interface is also provided, and the related model can be directly detected by calling the interface;

5) Convenient to move to other terminals: the Darknet framework is convenient to move to the local place of the computer, and can automatically call a processor or a display card which is needed to be used according to the condition of the computer, so that the Darknet framework is particularly suitable for the deployment of the local terminal of the computer for detecting and identifying tasks, is a pure C lightweight framework, is specially designed for Yolo, is an optimal matching platform of Yolo, and can experience Yolo functions.

Referring to fig. 5, fig. 5 provides a schematic diagram of image labeling according to the present embodiment. Fig. 5 provides a total of 3 scenes, each scene containing a different number of spaces and parked vehicles, wherein the shapes of the spaces are also different, and the number of idle spaces and occupied spaces are different. Because the shape of the parking space and the parking direction in the data set are related, the generalization capability of the model can be improved during training.

In this embodiment, in order to improve recognition accuracy, granularity of the object is thinned during labeling, and labeling is performed according to states of parking spaces respectively, including labeling the occupied parking spaces together with automobiles, and not labeling automobiles on the parking spaces. Because the parking space is divided into occupied parking space and idle parking space, if the parking space and vehicles are marked during marking and the using states of the parking spaces are not distinguished, whether the parking space is idle or not cannot be determined in the subsequent recognition process, so that a proper parking space can be found by helping a vehicle owner to correctly plan a path. Therefore, the embodiment carries out refined labeling, and can enable the model to identify the vehicle and the parking space using state when training the model.

Preferably, the Labelimg software is used for processing and labeling the stored pictures, and the xml file corresponding to the picture label is automatically regenerated after the processed file is stored, and the name of the file after the picture label is the same as the name of the label file in any picture file so as to prevent errors in the label format. The obtained data set is subjected to Labelimg processing to obtain an xml file, and various information (the size of the picture and the position parameter of each marking frame) of the marking frame is stored in the xml file.

In one embodiment, the total number of three scene-annotated datasets is 1753 pictures, and model training is performed by using the datasets. Since the number of subjects to be identified in the experiment is two, the training times is 4000, and 1.649822seconds are consumed, and the total training picture number is 26784.

It can be understood that the parking space recognition model of the embodiment can also be applied to the parking space monitoring room and the guiding position of the entrance of the parking space, and the image of the parking space can be updated in real time according to the parking space condition if the parking space recognition model is slightly modified. The parking space monitoring system is used in a parking lot monitoring room, so that the monitoring condition of a manager on parking spaces can be improved; the parking space information processing system has the advantages that people who park at the entrance of the parking lot can more intuitively know the situation of the parking space and more conveniently select the parking space.

Therefore, in this embodiment, different parking space scene pictures are adopted, and the use state of the parking space can be accurately identified through the refined labeling information, so that the vehicle owner is assisted to judge the accurate information of the parking space. The Yolov3 algorithm is trained based on the Darknet framework, so that a parking space recognition model can be obtained through faster training, and the generalization capability of the model can be enhanced through a comprehensive data set.

training a tire identification model comprising:

acquiring a second preset number of parking vehicle pictures;

labeling tires in the parking vehicle pictures to generate a second training set; wherein, the tyre and the vehicle body form at least two included angles with different sizes;

based on the Darknet framework, the Yolov3 algorithm is trained by using the second training set, and a tire identification model is generated.

When training a tire identification model, the quality of the data set directly influences the training effect of the quality of the network model, and in this embodiment, the collected sample pictures in various scenes need to be screened and corrected. Because the object of the detection is a vehicle wheel, the target is single and the type difference is not large. However, due to the influence of light and environmental factors, the situation such as shielding of obstacles can occur. It is therefore necessary to collect various types of wheel pictures from different perspectives.

During training, the labeling information of the pictures and the pictures are input into a network together. The labeling information comprises the category id of the target and the coordinate information of four corner points of the target frame. The screened pictures are marked by using a Labelimg marking tool. Labelimg is a visual image calibration tool, simple to operate and easy to install. The data labels required by different target detection networks such as Faster, R-CNN, YOLO, SSD and the like can be obtained. This time a total of 1200 pictures were collected. After preliminary screening, the appropriate 1085 pictures are selected and marked. When labeling, the wheels of the vehicle are taken as targets, other are taken as backgrounds, the target frames are selected by using rectangular frames, and a category label of 'wheels' is given. All the information needs to be marked manually, and the machine cannot be used for automatic marking. Therefore, the picture marking work is trivial and time-consuming, and great manpower is required to be input. And the marked picture generation format needs to be paid attention to during marking, otherwise, the xml file cannot be generated.

In the obtained xml files, the complete information of the corresponding pictures is stored in each xml file, wherein the complete information comprises the names of the pictures, the path addresses stored by the pictures, the sizes of the pictures, the labeling types of targets and the coordinate information of four corner points of a target frame. The voc_label.py file is then run to convert the xml file to a txt file. the txt file records the category and the location of the manual annotation box. The preparation of the data set is thus substantially complete, from which a large portion of the pictures are determined to be the second training set, e.g., 800, with the remaining 200 or more as the test set. As shown in fig. 6, the following pictures are used for training in the ideal state, then the second training set is used for training the Yolov3 algorithm based on the dark frame, the test set is used for verifying the trained model, and when the recognition precision meets the preset requirement, the final tire recognition model is generated.

In the embodiment, the included angle formed by the tire and the vehicle body is marked, so that the tire identification model can accurately identify the position of the tire, and whether the vehicle body deviates when the vehicle is parked or not is judged; similarly, the Yolov3 algorithm is trained based on the lightweight dark frame in the embodiment, so that the training speed is improved.

In one embodiment, the tire identification model is also capable of identifying tires in a video. Specifically, a dynamic vehicle video shot in the field is used as a test object, a traffic light crossroad with more traffic flow and more complex vehicle types is selected as a sampling place for detecting the video, and the model can also capture, identify and mark the wheel positions of the dynamic vehicle in the process of testing the video, namely the model is effective in detecting the dynamic data. The method can be used for obtaining a conclusion that the model can accurately identify whether the model is a static picture with lower detection difficulty or a dynamic video with higher detection difficulty, so that if the identification training precision of the tire identification model is required to be further improved, a video sample can be used as a training set during training, and the generalization capability of the model is improved.

training a lane recognition model, comprising:

obtaining lane scene pictures of a third preset number;

marking a road pit in the lane scene picture to generate a third training set;

training a Yolov5 algorithm by using a third training set to generate a lane recognition model; the Yolov5 algorithm comprises a CSP-Darknet feature network.

Usually, the lane pit is more difficult to identify compared with the vehicle tire or the vehicle tire, because the vehicle tire and the vehicle tire have obvious morphological characteristics, and the coordinate information of the vehicle tire is determined only by using a marking frame during marking; however, the road pit is easily affected by weather, background, light, etc., so as to change the texture and shape thereof, thus having great difficulty in recognition. In order to accurately identify the road pits, the embodiment mainly adopts a Yolov5 algorithm.

Fig. 7 provides a network structure of Yolov 5. The network (backup) of extracted features of Yolov5 is CSP-dark, which is a further reduction in computation and maintenance of accuracy on the basis of Yolov 3. The CSP-Darknet process basically uses CSP-DenseNet as a reference, and the difference between the CSP-Darknet process and the CSP-DenseNet process is that residual blocks are different.

FIG. 8 provides the structure of CSP-DenseNet. At the input module, data processing, such as mosaics, matrix training, is performed on the image. In the Backbone module, the characteristics of high, medium and low layers are extracted, the operation amount is reduced by using a CSP-DenseNet operation mode, and the operation speed is improved. On a Neck (Neck) module, features of all layers are fused and extracted to obtain large, medium and small feature images, and the function of the feature images is similar to that of a full-connection layer. The final detection part is performed at the Head (Head), an anchor box is applied on the feature map, and a final output vector with class probabilities, object scores and bounding boxes is generated. Loss function (Loss) calculation prediction results are compared with Loss of the group trunk, and parameters of the back propagation update model are achieved.

The CSP-DenseNet network divides the input into two parts, when the input purple characteristic diagram is input, half of the network is a neural network with excessive layers, and the other half of the network can selectively skip k layers to directly connect the later parts, so that the calculation amount is reduced for the skipped half.

Common target detection datasets are paspal VOCs, imageNet, COCO:

there are 20 classifications of the paspal VOC dataset, such as people, cats, dogs, boats, cars, televisions, chairs, etc., that contain 11530 images for training and verification.

The ImageNet dataset is characterized by possessing classification, localization, and detection task assessment data. The localization task has 1000 categories and the detection of 200 targets has 470000 images.

The COCO data set comprises 20 ten thousand images, more than 50 ten thousand target labels exist in 80 categories, and the COCO data set is the most extensive target detection data set, and is the most authoritative, most focused and unique game which can be collected by Google, microsoft and a plurality of domestic and foreign center institutions and excellent innovation enterprises in the field of computer vision.

The data set (third training set) of this embodiment is IEEE BigData 2020, 5 thousand pictures are publicly shared by global road damage detection challenge games in 2020, and divided into three countries with different degrees of infrastructure, so as to increase data diversity, in addition, 300 picture labels from real scenes are added to the data set, and 50 pictures are used as verification sets, so that model generalization capability is improved.

In general, the following difficulties exist in road pit detection: the method has the advantages of more shadows, disordered background, difficult labeling, various and irregular shapes and high detection speed requirement. Therefore, in order to improve the quality of the training set, when the road pit is shot, the detail features of the road pit are shot clearly as much as possible, and as the shape of the road pit is not uniform and the features to be recorded are more, the same target is shot at multiple angles, and although a plurality of scenes are preset in the prior stage, the data corresponding to each scene is not enough and the data is too crude.

In this embodiment, the Yolov5 algorithm can further improve the detection performance of the model by changing the loss function, changing the cross-correlation method, and changing the detection module structure. By employing CSP-Darknet, the amount of computation can be reduced and accuracy maintained.

In one possible implementation, training the Yolov5 algorithm with a third training set includes:

training a Yolov5 algorithm by adopting mosaic training based on a third training set;

the mosaic training is to scale and rotate a plurality of randomly acquired marked lane scene pictures, and form new lane scene pictures through piecing together so as to train the Yolov5 algorithm.

In one possible implementation, training the Yolov5 algorithm with a third training set further includes:

based on the third training set, training a Yolov5 algorithm by adopting self-countermeasure training; the self-countermeasure training includes a first stage and a second stage;

The first stage is used for changing the size of the marked lane scene picture;

Self-challenge training is performed in a first and second distinct phases. In the first stage, the neural network enlarges or reduces the size of the original image instead of modifying the network weight, and the neural network executes self-attack countermeasure to the neural network, so that the original image is changed, the neural network judges that the image has no detection target, and the detection target is determined according to self-countermeasure training conditions.

In the next stage, the neural network is used for training the images with the modified sizes, the trained neural network carries out normal target detection on the modified images, and the robustness of the network can be increased through the training of the self-attack countermeasure, namely, the target can be still identified under the conditions of fuzzy target, incomplete target and the like on the picture.

In this possible implementation, the robustness of the network can be increased by this training of self-attack countermeasure, and the recognition capability of the model can be improved.

In one embodiment, correcting the vehicle posture or performing path planning according to the recognition result includes:

In this embodiment, adjustment of the driving strategy is mainly assisted according to various recognition results. Wherein, whether the current parking posture of the vehicle is reasonable can be judged according to the identification result of the tire, when the angle between the tire and the vehicle body is overlarge, the posture of the vehicle can be corrected, the vehicle body and the tires are ensured to be in the range of the parking space, and the scratch or other safety problems are avoided; according to the using state of the parking space and the leveling condition of the road surface of the lane, the path to be driven can be determined, including how to bypass the road pit and how to drive to the most convenient parking space. Therefore, various recognition results are integrated, and compared with a single recognition result, the driving strategy can be formulated more accurately, and the energy efficiency of road management and the road traffic safety are improved.

Based on the same inventive concept as the above method, in another embodiment of the present invention, a vehicle vision detection device based on deep learning is also disclosed. Referring to fig. 9, a vehicle vision detection device based on deep learning according to an embodiment of the present invention includes:

an image acquisition unit 10 for acquiring a vehicle scene picture to be identified;

the identifying unit 20 is configured to input the vehicle scene picture to be identified into a preset vehicle vision detection model, and generate an identifying result; the recognition results comprise a parking space recognition result, a tire recognition result and a lane recognition result;

and the planning unit 30 is used for correcting the vehicle posture or planning the path according to the recognition result.

In the apparatus disclosed in this embodiment, specific implementation of each module may also correspond to corresponding descriptions of the method embodiments shown in the foregoing embodiments, which are not repeated herein for simplicity.

Also disclosed in one embodiment of the invention is a computer-readable storage medium having instructions stored therein that, when executed on a computer or processor, cause the computer or processor to perform one or more steps of any of the methods described above. The respective constituent modules of the above-described signal processing apparatus may be stored in the computer-readable storage medium if implemented in the form of software functional units and sold or used as independent products.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state drive SSD), etc.

In summary, by implementing the embodiment of the present invention, original audio data is obtained, and feature classification is performed on the original audio data to obtain at least one audio category; extracting the characteristics of the audio category to obtain characteristic information; converting the characteristic information into corresponding user interface elements; and when a modification instruction for the user interface element is acquired, the original audio data is adjusted to obtain modified audio data. According to the invention, the original audio data can provide audio visualization experience for users through the user interface elements, so that the users can feel and understand the content of the audio data more intuitively, and the audio data is adjusted through modification of the user interface elements.

Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above. And the aforementioned storage medium includes: various media capable of storing program code, such as ROM, RAM, magnetic or optical disks.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working processes of the electronic device, apparatus and the like described above may refer to corresponding processes in the foregoing method embodiments, which are not repeated herein.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims

1. A vehicle vision detection method based on deep learning, the method comprising:

acquiring a vehicle scene picture to be identified;

2. The vehicle vision inspection method based on deep learning according to claim 1, wherein the vehicle vision inspection model includes a parking space recognition model, a tire recognition model, and a lane recognition model; wherein,

3. The vehicle vision inspection method based on deep learning according to claim 2, further comprising, before the inputting the vehicle scene picture to be recognized into a preset vehicle vision inspection model:

training the parking space recognition model, including:

acquiring a first preset number of parking space scene pictures;

4. The vehicle vision inspection method based on deep learning according to claim 2, further comprising, before the inputting the vehicle scene picture to be recognized into a preset vehicle vision inspection model:

training the tire identification model, comprising:

acquiring a second preset number of parking vehicle pictures;

5. The vehicle vision inspection method based on deep learning according to claim 2, further comprising, before the inputting the vehicle scene picture to be recognized into a preset vehicle vision inspection model:

training the lane recognition model, comprising:

obtaining lane scene pictures of a third preset number;

6. The vehicle vision inspection method based on deep learning of claim 5, wherein training the Yolov5 algorithm with the third training set comprises:

training the Yolov5 algorithm by adopting mosaic training based on the third training set;

The mosaic training is to scale and rotate a plurality of randomly acquired marked lane scene pictures, and form new lane scene pictures through piecing so as to train the Yolov5 algorithm.

7. The vehicle vision inspection method based on deep learning of claim 5, wherein training the Yolov5 algorithm with the third training set further comprises:

the first stage is used for changing the size of the marked lane scene picture;

8. The vehicle vision inspection method based on deep learning according to claim 1, wherein correcting the vehicle posture or performing path planning according to the recognition result comprises:

9. A vehicle vision inspection device based on deep learning, the device comprising:

10. A computer storage medium storing at least one program executable by a computer, wherein the at least one program, when executed by the computer, causes the computer to perform the steps of the deep learning-based vehicular visual inspection method according to any one of claims 1 to 8.