CN116229379B

CN116229379B - Road attribute identification method and device, electronic equipment and storage medium

Info

Publication number: CN116229379B
Application number: CN202310514624.4A
Authority: CN
Inventors: 段富治; 余言勋; 郝行猛; 黄宇; 王亚运; 金恒; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2023-05-06
Filing date: 2023-05-06
Publication date: 2024-02-02
Anticipated expiration: 2043-05-06
Also published as: CN116229379A

Abstract

The application discloses a road attribute identification method, a device, electronic equipment and a storage medium. After a first image to be processed in the road monitoring video is obtained, carrying out feature extraction on the first image based on a feature extraction sub-network in the road attribute identification model to obtain a road feature map; and carrying out feature extraction on the first image based on the background feature enhancement subnetwork in the road attribute identification model to obtain a background feature map. And then carrying out fusion processing on the road feature map and the road background feature map to obtain a road fusion feature map corresponding to the first image, and finally, identifying the road fusion feature map to obtain a road attribute identification result of the first image. The feature extraction is carried out on the first image by combining the feature extraction sub-network and the background feature enhancement sub-network, so that noise can be effectively suppressed, the background can be enhanced, and the accuracy of road attribute identification can be improved.

Description

Road attribute identification method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of intelligent traffic technologies, and in particular, to a method and apparatus for identifying a road attribute, an electronic device, and a storage medium.

Background

With the increasing size of road infrastructure networks, the importance of the identification and maintenance of road attributes is becoming increasingly prominent. The road attribute is, for example, a road defect attribute such as a crack, a pit, or the like, or an attribute such as a fixed barrier, a fixed marker, or the like in a road. The road attribute is completely identified by adopting a manual means, so that the time and the effort are too long, and the road attribute is automatically identified by adopting a machine learning method for assistance at present.

The existing road attribute identification method based on machine learning generally adopts means such as threshold segmentation and morphological operation, but can realize automatic identification of the road attribute. However, for some foreground objects similar to the shape of the road attribute, the road attribute is easy to be misidentified, for example, a bamboo pole or a linear object is placed on a vehicle in an image, and the existing scheme is easy to identify the bamboo pole or the linear object as a longitudinal crack, so that the misidentification of the road attribute is caused, and the accuracy of the road attribute identification is poor.

Disclosure of Invention

The embodiment of the application provides a road attribute identification method, a device, electronic equipment and a storage medium, which are used for solving the problem of poor accuracy of road attribute identification in the prior art.

In a first aspect, the present application provides a road attribute identifying method, the method including:

acquiring a first image to be processed in a road monitoring video, and inputting the first image into a trained road attribute identification model;

extracting features of the first image through a feature extraction sub-network in the road attribute identification model to obtain a road feature map; extracting the characteristics of the first image through a background characteristic enhancement sub-network in the road attribute identification model to obtain a background characteristic map;

carrying out fusion processing on the road feature map and the road background feature map to obtain a road fusion feature map;

and identifying the road fusion feature map to obtain a road attribute identification result of the first image.

In a second aspect, the present application provides a road attribute identifying apparatus, the apparatus comprising:

the acquisition module is used for acquiring a first image to be processed in the road monitoring video, and inputting the first image into the trained road attribute identification model;

the first determining module is used for extracting the characteristics of the first image through the characteristic extraction sub-network in the road attribute identification model to obtain a road characteristic diagram; extracting the characteristics of the first image through a background characteristic enhancement sub-network in the road attribute identification model to obtain a background characteristic map;

The second determining module is used for carrying out fusion processing on the road feature map and the road background feature map to obtain a road fusion feature map;

and the identification module is used for identifying the road fusion feature map to obtain a road attribute identification result of the first image.

In a third aspect, the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the method when executing the program stored in the memory.

In a fourth aspect, the present application provides a computer-readable storage medium having a computer program stored therein, which when executed by a processor, implements the method steps.

The application provides a road attribute identification method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a first image to be processed in a road monitoring video, and inputting the first image into a trained road attribute identification model; extracting features of the first image through a feature extraction sub-network in the road attribute identification model to obtain a road feature map; extracting the characteristics of the first image through a background characteristic enhancement sub-network in the road attribute identification model to obtain a background characteristic map; carrying out fusion processing on the road feature map and the road background feature map to obtain a road fusion feature map; and identifying the road fusion feature map to obtain a road attribute identification result of the first image.

The technical scheme has the following advantages or beneficial effects:

in the application, the road attribute recognition model after training comprises a feature extraction sub-network and a background feature enhancement sub-network. After a first image to be processed in the road monitoring video is obtained, carrying out feature extraction on the first image based on a feature extraction sub-network in the road attribute identification model to obtain a road feature map; and extracting the characteristics of the first image based on the background characteristic enhancement subnetwork in the road attribute identification model to obtain a background characteristic map. And then carrying out fusion processing on the road feature map and the road background feature map to obtain a road fusion feature map corresponding to the first image, and finally, identifying the road fusion feature map to obtain a road attribute identification result of the first image. The feature extraction is carried out on the first image by combining the feature extraction sub-network and the background feature enhancement sub-network, so that noise can be effectively suppressed, the background can be enhanced, and the accuracy of road attribute identification can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a road attribute identification process provided in the present application;

FIG. 2 is a schematic diagram of a training process of a road attribute identification model provided in the present application;

FIG. 3 is an overall flow chart for identifying road diseases provided by the present application;

fig. 4 is a schematic diagram of a training process of a road disease recognition model based on semantic segmentation provided by the application;

fig. 5 is a frame diagram of a road disease recognition model provided in the present application;

fig. 6 is a schematic structural diagram of a road disease recognition device provided by the present application;

fig. 7 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

For purposes of clarity and implementation of the present application, the following description will make clear and complete descriptions of exemplary implementations of the present application with reference to the accompanying drawings in which exemplary implementations of the present application are illustrated, it being apparent that the exemplary implementations described are only some, but not all, of the examples of the present application.

It should be noted that the brief description of the terms in the present application is only for convenience in understanding the embodiments described below, and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The terms "first," second, "" third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for limiting a particular order or sequence, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code that is capable of performing the function associated with that element.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Fig. 1 is a schematic diagram of a road disease identification process provided in the present application, the process including the following steps:

s101: and acquiring a first image to be processed in the road monitoring video, and inputting the first image into a trained road attribute identification model.

S102: extracting features of the first image through a feature extraction sub-network in the road attribute identification model to obtain a road feature map; and extracting the characteristics of the first image through a background characteristic enhancement sub-network in the road attribute identification model to obtain a background characteristic map.

S103: and carrying out fusion processing on the road feature map and the road background feature map to obtain a road fusion feature map.

S104: and identifying the road fusion feature map to obtain a road attribute identification result of the first image.

The road disease identification method is applied to electronic equipment, and the electronic equipment can be PC, tablet personal computer and other equipment, and can also be a server.

And the electronic equipment takes the image to be processed as a first image after acquiring the road monitoring video. The image to be processed can be any frame of image in the road monitoring video. It should be noted that, the electronic device sequentially performs the road attribute recognition as the first image to be processed according to the sequence of the video frames.

The electronic equipment stores a trained road attribute recognition model, wherein the road attribute recognition model comprises a feature extraction sub-network and a background feature enhancement sub-network. And inputting the first image into a trained road attribute recognition model, and respectively carrying out feature extraction on the first image based on a feature extraction sub-network and a background feature enhancement sub-network to respectively obtain a road feature map and a background feature map. Carrying out fusion processing on the road feature map and the road background feature map to obtain a road fusion feature map; and identifying the road fusion feature map to obtain a road attribute identification result of the first image. The feature extraction is carried out on the first image by combining the feature extraction sub-network and the background feature enhancement sub-network, so that noise can be effectively suppressed, the background can be enhanced, and the accuracy of road attribute identification can be improved.

In order to further improve accuracy of road attribute identification, in the present application, identifying the road fusion feature map, the obtaining a road attribute identification result of the first image includes:

identifying the road fusion feature map to obtain a first road attribute identification result of the first image;

acquiring a second road attribute identification result of a second image of the previous video frame;

and carrying out fusion processing on the first road attribute identification result and the second road attribute identification result to obtain a road feature identification result of the first image.

For example, there is a vehicle in the first image of the current frame, on which a bamboo pole or other linear object is placed, and the bamboo pole or other linear object is easily mistakenly recognized as a crack based on only the recognition result of the first image of the current frame. The method and the device combine the road disease recognition results of the previous video frames, and because the bamboo poles or other linear objects cannot appear at the same position in each frame in the previous video frames, the road disease recognition results of the current frame and the previous road disease recognition results are combined and processed to jointly determine the road disease recognition results of the current frame, the problem of the false recognition can be effectively solved, and the accuracy of the road disease recognition is improved. Similarly, the recognition accuracy of the road disease shielded by the vehicle can be effectively improved by combining the road disease recognition result before the current frame through the multi-frame moving average reasoning.

Fig. 2 is a schematic diagram of a road attribute recognition model training process provided in the present application, where the process includes the following steps:

s201: for each scene in a sample set, selecting a first sample image of the scene to be input into a feature extraction sub-network in a road attribute identification model, and selecting a second sample image of the scene to be input into a background feature enhancement sub-network in the road attribute identification model.

S202: performing feature extraction on the first sample image based on the feature extraction sub-network to obtain a first sample feature map, and performing feature extraction on the second sample image based on the background feature enhancement sub-network to obtain a second sample feature map; performing fusion processing on the first sample feature map and the second sample feature map to obtain a third sample feature map; and determining a sample prediction result based on the third sample feature map, and training a feature extraction sub-network and the background feature enhancement sub-network in the road attribute identification model according to the sample prediction result and the label information corresponding to the first sample image.

The electronic device stores a sample set for training the road attribute recognition model, the sample set containing successive video frame images of each scene. The road attribute identification model includes a feature extraction sub-network and a background feature enhancement sub-network. Training of the road attribute recognition model, that is, training of the feature extraction sub-network and the background feature enhancement sub-network in the road attribute recognition model. During training, any two sample images in each scene are used as sample image groups firstly, so that a large number of sample image groups can be obtained. Two sample images in each sample image group are respectively used as a first sample image and a second sample image.

For each sample image group, inputting a first sample image into a feature extraction sub-network in the road attribute identification model, and inputting a second sample image into a background feature enhancement sub-network in the road attribute identification model. And carrying out feature extraction on the first sample image based on the feature extraction sub-network to obtain a first sample feature image, and carrying out feature extraction on the second sample image based on the background feature enhancement sub-network to obtain a second sample feature image. And then, carrying out fusion processing on the first sample characteristic diagram and the second sample characteristic diagram to obtain a third sample characteristic diagram. The fusion weight values corresponding to the first sample feature map and the second sample feature map can be respectively allocated to the first sample feature map and the second sample feature map, and then fusion results of the first sample feature map and the second sample feature map are obtained through weighted average, so that a third sample feature map is obtained. And the third sample feature map passes through a convolution layer and a classification layer of 1*1 in the road disease identification model to obtain a sample prediction result. The sample prediction result is the identification category of each pixel point, namely the road attribute pixel point, the foreground pixel points of automobiles, pedestrians and the like, and the background pixel points of the ground, flower bed and the like are identified. Taking road attribute pixels as road damage pixels as an example, the road damage pixels are specifically road damage pixels of various categories, and the categories are, for example, transverse cracks, longitudinal cracks, construction splice cracks, network cracks, pits and the like. The electronic equipment stores label information of a first sample image, wherein the label information of the first sample image is a true category of each pixel point in the first sample image. And determining a loss value according to the sample prediction result and the label information corresponding to the first sample image, repeating the training process, and finishing the training of the road attribute identification model when the loss value meets the requirement. The model parameter training of the feature extraction sub-network, the background feature enhancement sub-network and each network structure layer in the road attribute identification model is completed.

In the application process of road attribute identification, after a first image to be identified of a current video frame is acquired, the first image is input into a feature extraction sub-network and a background feature enhancement sub-network in a trained road attribute identification model. The feature extraction sub-network and the background feature enhancement sub-network respectively conduct feature extraction on the first image to obtain two feature images, then the two feature images are fused to obtain a fused feature image, and the fused feature image passes through the 1*1 convolution layer and the classification layer to obtain a road attribute identification result of the first image. The scales of the feature maps are the same, and the feature map fusion can be weighted fusion of feature values of corresponding positions.

The road attribute recognition model is trained by adopting two sample images, so that the two sample images are different, and two branches of the characteristic extraction sub-network and the background characteristic enhancement sub-network respectively learn road disease (attribute) characteristics and background characteristics. When the method is applied, one current image is respectively input into the feature extraction sub-network and the background feature enhancement sub-network, and the road disease (attribute) features and the background features of the current image can be simultaneously extracted and then fused, so that the road attribute recognition result is more accurate.

In the application, training of a feature extraction sub-network and a background feature enhancement sub-network is completed based on a first sample image and a second sample image of the same scene, specifically, feature extraction is performed on the first sample image based on the feature extraction sub-network, and feature extraction is performed on the second sample image based on the background feature enhancement sub-network; and carrying out fusion processing on the extraction results to further determine sample prediction results, and training the feature extraction sub-network and the background feature enhancement sub-network according to the sample prediction results and the label information corresponding to the first sample image. The two images of the same scene are adopted to train the feature extraction sub-network and the background feature enhancement sub-network, the background feature enhancement sub-network branch is used as noise of the feature extraction sub-network branch, the noise can be effectively restrained according to a training result obtained by the sample prediction result and the label information corresponding to the first sample image, the background is enhanced, and the feature extraction sub-network and the background feature enhancement sub-network are good in accuracy and strong in robustness. In practical application, the road disease (attribute) of the first image of the current video frame can be accurately identified, and the accuracy of identifying the road disease (attribute) is good.

In the method, the final calculated loss value is determined according to the label information corresponding to the first sample image processed by the feature extraction sub-network, so that the output result of the road attribute identification model is expected to be completely led by the feature extraction sub-network, the background feature enhancement sub-network plays a role in supplementing and enhancing the background gradually along with the training, and the weight values of the final feature extraction sub-network and the background feature enhancement sub-network are the same, thereby achieving the effect of enhancing the background and improving the accuracy of road attribute identification. Based on the above consideration, performing fusion processing on the first sample feature map and the second sample feature map, to obtain a third sample feature map includes:

acquiring current training times, and determining a first weight value of the feature extraction sub-network and a second weight value of the background feature enhancement sub-network according to the current training times and a preset total training times; the smaller the current training times are, the larger the first weight value is;

and carrying out fusion processing on the first sample feature map and the second sample feature map according to the first weight value and the second weight value to obtain a third sample feature map.

The electronic device stores a preset total number of training times for training the road attribute recognition model, for example, 1000 times, 2000 times, and the like. In the iterative training process of the road attribute identification model, the first weight value of the feature extraction sub-network and the second weight value of the background feature enhancement sub-network determined by each iterative process are different. Specifically, the current training frequency is obtained, for example, the current training frequency is the 20 th iterative training. Determining a first weight value of the feature extraction sub-network and a second weight value of the background feature enhancement sub-network according to the current training times and the preset total training times; the smaller the current training times are, the larger the first weight value is. In order to prevent the second weight value of the background feature enhancement sub-network from being too large, the first weight value is not smaller than 0.5 in the application.

In this application, determining the first weight value of the feature extraction sub-network and the second weight value of the background feature enhancement sub-network according to the current training times and the preset total training times includes:

bringing the current training times and the preset total training times into a formula alpha=1-T/(2T) to obtain a first weight value alpha of the feature extraction sub-network and a second weight value 1-alpha of the background feature enhancement sub-network; wherein T is the current training times, and T is the preset total training times.

For example, if the preset total training time is 1000 times and the current training time is 100 th time, the first weight value α=1-100/2000=0.95 and the second weight value 1- α=0.05 are determined.

The method and the device expect that the output result of the initial model is totally dominated by the characteristic extraction sub-network, the background characteristic enhancement sub-network gradually plays a role in supplement along with the training, and finally the weights of the two sub-networks are the same. The model learning is easy and difficult, the feature extraction sub-network (N1) and the input Image (Image 1) Label (Label 1), the background feature enhancement extraction sub-network (N2) and the Image (Image 2) thereof. Initially, the back propagation is performed through Label and Image1, so that the method is simple, and better network parameters can be learned. Subsequent learning is difficult, and it is difficult to suppress irrelevant information and enhance relevant information by learning through an Image 2. According to the thought of the first-come last-go, N1 is used for leading.

In the application, the purpose of identifying road diseases is to identify the road diseases between the two outermost side lane lines, and the two outermost side lane lines have more interference and can bring a large amount of noise to the road disease identification. Based on the above-mentioned considerations, in the present application, training the feature extraction sub-network and the background feature enhancement sub-network in the road attribute identification model according to the sample prediction result and the label information corresponding to the first sample image includes:

Determining semantic segmentation loss according to sample category information of each pixel point in the sample prediction result and real category information of each pixel point in label information corresponding to the first sample image;

determining the lane line distance loss according to the lane line information in the label information corresponding to the pixel points of each road attribute in the sample prediction result and the first sample image;

determining a total loss according to the semantic segmentation loss and the lane line distance loss; and when the total loss reaches a preset loss threshold value, determining that the feature extraction sub-network and the background feature enhancement sub-network in the road attribute identification model are trained.

In the application, firstly, determining semantic segmentation loss according to sample category information of each pixel point in a sample prediction result and real category information of each pixel point in label information corresponding to a first sample image; then, determining the lane line distance loss according to the lane line information in the label information corresponding to the pixel points of each road attribute in the sample prediction result and the first sample image; and finally, determining total loss according to semantic segmentation loss and lane line distance loss. Wherein, the sum of the semantic segmentation loss and the lane line distance loss can be taken as the total loss. Or respectively distributing the weight values corresponding to the semantic segmentation loss and the lane line distance loss, and then carrying out weighted summation on the semantic segmentation loss and the lane line distance loss and the weight values corresponding to the semantic segmentation loss and the lane line distance loss to obtain the total loss.

The lane line distance loss is calculated as follows:

；

wherein,for the minimum value of each lane line, +.>For the maximum value of each lane line of a vehicle,the method is used for calculating the distance between the misrecognized road defects on two sides of the lane lines and the lane lines, and the distance between the road defects between the lane lines and the lane lines is 0. When the road damage is predicted, the position relation between the predicted point and the lane line in the label is judged, and then the lane line loss is calculated.

The total loss is calculated as follows:

；

Loss ₁ loss of lane line distance, loss ₂ For the semantic segmentation loss, α is the weight of the lane line distance loss, β is the weight of the semantic segmentation loss, α+β=1, and loss is the total loss.

And the electronic equipment stores a preset loss threshold value, and when the total loss reaches the preset loss threshold value, the road disease extraction sub-network and the background characteristic enhancement sub-network are determined to complete training.

Alternatively, a loss function is usedAnd (3) carrying out back propagation, after each training period is completed, evaluating the precision on the verification set, and storing the model with the highest precision on the verification set as an optimal model for subsequent model reasoning.

In order to improve recognition accuracy of road disease blocked by a vehicle, in the present application, inputting the first image into a feature extraction sub-network and a background feature enhancement sub-network in a trained road attribute recognition model, respectively, to obtain a road attribute recognition result of the first image includes:

Respectively inputting the first image into a feature extraction sub-network and a background feature enhancement sub-network in a trained road attribute recognition model to obtain a first road attribute recognition result of the first image;

and carrying out fusion processing according to the first road attribute identification result and the second road attribute identification result to obtain a road attribute identification result of the first image.

In the application, a first image of a current video frame is respectively input into a feature extraction sub-network and a background feature enhancement sub-network in a trained road attribute recognition model, and a road attribute recognition result of the first image obtained based on the road attribute recognition model is used as a first road attribute recognition result. And acquiring a second road attribute identification result of a second image of the previous video frame of the first image. And then, carrying out fusion processing on the first road attribute identification result and the second road attribute identification result to obtain a final road attribute identification result of the first image.

For the first frame image, the first frame image is input into a feature extraction sub-network and a background feature enhancement sub-network in the trained road attribute recognition model, and a road attribute recognition result obtained based on the road attribute recognition model is used as a road attribute recognition result of the first frame image. And inputting the second frame image into a feature extraction sub-network and a background feature enhancement sub-network in the trained road attribute identification model, taking the road attribute identification result obtained based on the road attribute identification model as a first road identification result of the second frame image, taking the road attribute identification result of the first frame image as a second road attribute identification result at the moment, and then carrying out fusion processing on the first road attribute identification result and the second road attribute identification result to obtain the road attribute identification result of the second frame image. And by analogy, when determining the road attribute recognition result of the first image which is currently processed, taking the road attribute recognition result of the first image which is obtained based on the road attribute recognition model as a first road attribute recognition result, and carrying out fusion processing on the first road attribute recognition result and a second road attribute recognition result of the previous video frame to obtain the road attribute recognition result of the first image.

Specifically, the first road attribute identification result and the second road attribute identification result are brought into a formulaAnd obtaining a road attribute identification result of the first image. First, theA road attribute recognition result is +.>The second road attribute recognition result is +.>The road attribute recognition result is +.>Beta is a weight value corresponding to the first road attribute identification result, and 1-beta is a weight value corresponding to the second road attribute identification result. Beta is a smaller value such as 0.05, 0.1, 0.15, etc.

For example, there is a vehicle in the first image of the current frame, on which a bamboo pole or other linear object is placed, and the bamboo pole or other linear object is easily mistakenly recognized as a crack based on only the recognition result of the first image of the current frame. The method and the device combine the road attribute identification results of the previous video frames, and because the bamboo poles or other linear objects cannot appear in the same position in each frame in the previous video frames, the road attribute identification results of the current frame and the road disease attribute identification results of the previous video frames are combined and processed together to determine the road attribute identification results of the current frame, so that the problem of misidentification can be effectively solved, and the accuracy of road attribute identification is improved. Similarly, the recognition accuracy of the road attribute blocked by the vehicle can be effectively improved by combining the road attribute recognition result before the current frame through the multi-frame moving average reasoning.

In this application, the method further includes:

obtaining road attribute pixel points of each category in the road attribute identification result of the first image;

and fitting the road attribute pixel points of each category aiming at the road attribute of each category, and determining the size information and the shape information of the road attribute of the category.

In the application, each road attribute pixel point in the first image and the category of the road attribute of each road attribute pixel point can be determined according to the road attribute identification result of the first image. And fitting the road attribute pixel points of each category aiming at the road attribute of each category, and determining the size information and the shape information of the road attribute of the category. The fitting process includes, but is not limited to, a closed-loop operation process, a linear fitting process, and the like. For example, for road defect pixels of the transverse crack type, fitting processing is performed on the road defect pixels of the transverse crack type, and size information and shape information of the transverse crack are determined.

The real size information and the shape information of the road diseases are determined by fitting the road disease pixel points. The real size information and the shape information can accurately reflect the damage degree of the road disease.

In this application, the method further includes:

acquiring a road attribute identification result of a third image positioned before a first image to be processed in the road monitoring video;

and determining the change rate of the road attribute identification result of the third image and the road attribute identification result of the first image, judging whether the change rate is smaller than a set change rate threshold, if not, carrying out road attribute identification on the next video frame image of the first image, and if so, stopping carrying out road attribute identification on the next video frame image.

In the present application, the electronic device acquires a third image located before the first image, where the third image may be any image before the first image. And acquiring a road attribute identification result of the third image, and determining the change rate of the road attribute identification result of the third image and the road attribute identification result of the first image. In determining the rate of change, for example, a ratio of the number of pixels of different categories to the total number of pixels contained in the first image is taken as the rate of change. The electronic device stores a set change rate threshold, for example, 1% and 2%. If the change rate of the road attribute identification result of the third image and the road attribute identification result of the first image is smaller than the set change rate threshold, the road attribute is considered to be stable, and no larger change occurs, and at the moment, the road attribute identification on the next video frame image of the first image can be stopped, so that the calculation force is saved, and the power consumption is reduced. If the change rate of the road attribute identification result of the third image and the road attribute identification result of the first image is not smaller than the set change rate threshold, the road attribute is considered to be changed, and at the moment, the road attribute identification is continuously carried out on the next video frame image of the current video frame.

The road attribute is identified as road disease identification for explanation.

The application provides a road disease identification method based on semantic segmentation. The method is divided into two phases of training and application.

During training, continuous n (n > 1) frame images of a plurality of scenes are needed, two frames are selected randomly from the continuous n (n > 1) frame images and are respectively sent to a feature extraction sub-network and a background feature enhancement sub-network in the road disease recognition model, the two sub-networks respectively output prediction results F1 (feature extraction sub-network) and F2 (background feature enhancement sub-network), and the model can gradually learn background information in the images through dynamically adjusting the weight of F2, so that the road disease recognition is more accurate. Meanwhile, the relation between the road diseases and the lane lines is used as priori knowledge, the lane line distance loss is designed, and when the predicted road diseases are distributed on two sides of the lane lines, the loss of the predicted points is increased. Thereby reducing the interference of the interference objects outside the lane lines on the road disease identification.

When in use, the current video frameRespectively sending the video frames into a feature extraction sub-network and a background feature enhancement sub-network to obtain feature graphs F1 and F2, and directly adding or weighted-averaging the feature graphs F1 and F2 to obtain a current video frame- >Predicted outcome of->. Preserving prediction of current frame->Fusion result with previous frame +.>Then, a sliding average is performed to obtain the fusion result of the current frame +.>And gradually carrying out multi-frame moving average reasoning. If the current scene is a number of frames, the change of the identification result is small, and the algorithm is terminated within a certain time. And finally, carrying out post-processing algorithms such as fitting processing and the like, and outputting the size information of the road diseases, wherein the size information comprises length, shape and area information.

Fig. 3 is a flowchart of an overall road disease identification process provided by the application, which comprises road disease data set production, training of a road disease identification model based on semantic segmentation, and road disease reasoning and post-fitting treatment based on the road disease identification model.

The road disease data set is manufactured as follows:

the road disease data set comprises two parts, namely image data and labeling data.

The image data needs to collect continuous n (n > 1) frame images of a plurality of scenes, and among the collected images, there is no requirement on whether the road diseases are partially blocked.

Marking data is used for marking road diseases such as transverse cracks, longitudinal cracks, construction splice cracks, network cracks, pit grooves and the like, and meanwhile marking lane lines.

Finally, the training set and the verification set are divided according to a reasonable proportion. Data of the same scene should belong to the same set.

The road disease recognition model training process based on semantic segmentation is as follows:

fig. 4 is a schematic diagram of a training process of a road disease recognition model based on semantic segmentation, which is provided by the application, and comprises training data input, forward propagation of a semantic segmentation model, backward propagation of a joint loss function, verification set precision verification and model storage.

In specific implementation, two images of a scene are randomly selected and used as the input of a road disease identification model. In the forward propagation process, the two images are respectively sent into a feature extraction sub-network and a background feature enhancement sub-network to obtain feature images F1 and F2, and then the two feature images are weighted and fused. In the back propagation, besides semantic segmentation loss, lane line distance loss is increased, and parameter updating is performed by combining two loss functions. And after each iteration period, storing the model according to the accuracy verification result on the verification set.

Fig. 5 is a frame diagram of a road disease recognition model provided in the present application. Firstly, two images of a scene are randomly selected and respectively sent into a feature extraction sub-network and a background feature enhancement sub-network to obtain two feature images F1 and F2. Both subnetworks contain a feature extraction network including, but not limited to, models such as FCN, UNet, deeplab-v3, and a normalization module including, but not limited to, a softmax function, a learning attention module. The ratio of F1 and F2 is then balanced using a dynamic weight, and the final output profile can be expressed as f3=α×f1+ (1- α) ×f2. Because the final loss is determined according to the label information corresponding to the video frames input to the road disease extraction sub-network, the output result of the expected initial model is all led by the feature extraction sub-network, the background feature enhancement sub-network gradually plays a role in supplementing along with the training, and finally the weights of the two sub-networks are the same. Thus, the weight α designed in the present application decays with increasing training period, from 1 to 0.5. The specific formula is as follows:

。

Wherein T is the current training times, and T is the preset total training times.

It should be noted that, according to the thought of first-and-last-difficulty, the feature extraction sub-network and the background feature enhancement sub-network may share weights in the early stage of training, so that the background feature enhancement sub-network may be the same as the feature extraction sub-network in the early stage, and the later stage feature extraction sub-network and the background feature enhancement sub-network do not share weights, so as to learn the features of difference. Or it may be that most of the network layers are weight-shared and only a small portion of the network layers are not weight-shared, thereby allowing a small portion of the network layers to learn the characteristics of the differences.

After the steps, semantic representation of the road disease is obtained, and then the loss of the output is calculated by using a joint loss function. The loss function includes two parts, lane line distance lossAnd semantic segmentation (cross entropy) loss->. The lane line distance loss is calculated as follows:

wherein,for the minimum value of each lane line, +.>For a maximum value on both sides of each lane line,the method is used for calculating the distance between the misrecognized road defect on two sides of the lane line and the lane line, and the middle distance of the lane line is 0.

Total loss is recorded as ，/>There is a correlation of:

。

wherein,. When->And after the preset target value is reached, training is completed.

And (3) carrying out back propagation by using the loss function, evaluating the precision on the verification set after each training period is completed, and storing a model with the highest precision on the verification set as an optimal model for subsequent model reasoning.

The road disease model reasoning process based on semantic segmentation is as follows:

in the actual application process, the current video frame is predictedRespectively sending the video frames into a feature extraction sub-network and a background feature enhancement sub-network to obtain feature graphs F1 and F2, and directly adding and averaging the feature graphs F1 and F2 to obtain a prediction result +_ of the current video frame>. Then the predicted result of the current frame and the fusion result of the previous time sequence frame are +.>Performing moving average to obtain fusion result of current frameThe running average is as follows:

。

then gradually carrying out multi-frame sliding average reasoning, and effectively improving the recognition accuracy of road diseases shielded by vehicles. Meanwhile, in the running process, the fusion result of the previous frame and the current frame is stored, the change rate of the two frame results is compared, if the current scene is subjected to moving average reasoning, the change of the identification result is small after a plurality of frames are inferred, and the algorithm is terminated within a certain time. When the change rate is smaller than the set threshold value, the road disease recognition model is stopped, so that calculation force can be saved.

The post-treatment process is as follows:

the semantic segmentation result may have discrete points and other phenomena, and the length information of the semantic segmentation result can be output by performing post-processing such as linear fitting on road diseases such as cracks and the like; meanwhile, after-treatment such as closing operation is carried out on road diseases such as pits and grooves, and the area information of the road diseases can be output, so that the road disease degree can be counted conveniently.

Fig. 6 is a schematic structural diagram of a road disease recognition device provided in the present application, including:

the obtaining module 61 is configured to obtain a first image to be processed in the road monitoring video, and input the first image into a trained road attribute recognition model;

a first determining module 62, configured to perform feature extraction on the first image through a feature extraction sub-network in the road attribute identification model, so as to obtain a road feature map; extracting the characteristics of the first image through a background characteristic enhancement sub-network in the road attribute identification model to obtain a background characteristic map;

a second determining module 63, configured to perform fusion processing on the road feature map and the road background feature map, so as to obtain a road fusion feature map;

and the recognition module 64 is configured to recognize the road fusion feature map, and obtain a road attribute recognition result of the first image.

The identifying module 64 is specifically configured to identify the road fusion feature map, so as to obtain a first road attribute identification result of the first image; acquiring a second road attribute identification result of a second image of the previous video frame; and carrying out fusion processing on the first road attribute identification result and the second road attribute identification result to obtain a road feature identification result of the first image.

The apparatus further comprises:

the training module 65 is configured to select, for each scene in the sample set, a first sample image of the scene to be input to a feature extraction sub-network in the road attribute recognition model, and a second sample image of the scene to be input to a background feature enhancement sub-network in the road attribute recognition model; performing feature extraction on the first sample image based on the feature extraction sub-network to obtain a first sample feature map, and performing feature extraction on the second sample image based on the background feature enhancement sub-network to obtain a second sample feature map; performing fusion processing on the first sample feature map and the second sample feature map to obtain a third sample feature map; and determining a sample prediction result based on the third sample feature map, and training a feature extraction sub-network and the background feature enhancement sub-network in the road attribute identification model according to the sample prediction result and the label information corresponding to the first sample image.

The training module 65 is specifically configured to obtain a current training frequency, and determine a first weight value of the feature extraction sub-network and a second weight value of the background feature enhancement sub-network according to the current training frequency and a preset total training frequency; the smaller the current training times are, the larger the first weight value is; and carrying out fusion processing on the first sample feature map and the second sample feature map according to the first weight value and the second weight value to obtain a third sample feature map.

The training module 65 is specifically configured to bring the current training frequency and the preset total training frequency into a formula α=1-T/(2T), so as to obtain a first weight value α of the feature extraction sub-network and a second weight value 1- α of the background feature enhancement sub-network; wherein T is the current training times, and T is the preset total training times.

The training module 65 is specifically configured to determine semantic segmentation loss according to sample category information of each pixel point in the sample prediction result and real category information of each pixel point in the label information corresponding to the first sample image; determining the lane line distance loss according to the lane line information in the label information corresponding to the pixel points of each road attribute in the sample prediction result and the first sample image; determining a total loss according to the semantic segmentation loss and the lane line distance loss; and when the total loss reaches a preset loss threshold value, determining that the feature extraction sub-network and the background feature enhancement sub-network in the road attribute identification model are trained.

The identifying module 64 is further configured to obtain each category of road attribute pixel points in the road attribute identifying result of the first image; and fitting the road attribute pixel points of each category aiming at the road attribute of each category, and determining the size information and the shape information of the road attribute of the category.

The identifying module 64 is further configured to obtain a road attribute identifying result of a third image located before the first image to be processed in the road monitoring video; and determining the change rate of the road attribute identification result of the third image and the road attribute identification result of the first image, judging whether the change rate is smaller than a set change rate threshold, if not, carrying out road attribute identification on the next video frame image of the first image, and if so, stopping carrying out road attribute identification on the next video frame image.

The present application also provides an electronic device, as shown in fig. 7, including: a processor 701, a communication interface 702, a memory 703 and a communication bus 704, wherein the processor 701, the communication interface 702 and the memory 703 communicate with each other through the communication bus 704;

the memory 703 has stored therein a computer program which, when executed by the processor 701, causes the processor 701 to perform any of the above method steps.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface 702 is used for communication between the electronic device and other devices described above.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

The present application also provides a computer-readable storage medium having stored thereon a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform any of the above method steps.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A method of identifying a road attribute, the method comprising:

identifying the road fusion feature map to obtain a road attribute identification result of the first image;

the road attribute identification model is obtained by training in the following way:

selecting a first sample image of a scene to be input into a feature extraction sub-network in a road attribute identification model, and selecting a second sample image of the scene to be input into a background feature enhancement sub-network in the road attribute identification model aiming at each scene in a sample set;

performing feature extraction on the first sample image based on the feature extraction sub-network to obtain a first sample feature map, and performing feature extraction on the second sample image based on the background feature enhancement sub-network to obtain a second sample feature map; performing fusion processing on the first sample feature map and the second sample feature map to obtain a third sample feature map; determining a sample prediction result based on the third sample feature map, and training a feature extraction sub-network and the background feature enhancement sub-network in the road attribute identification model according to the sample prediction result and label information corresponding to the first sample image;

The fusing processing is performed on the first sample feature map and the second sample feature map, and obtaining a third sample feature map includes:

2. The method of claim 1, wherein identifying the road fusion feature map to obtain a road attribute identification result of the first image comprises:

3. The method of claim 1, wherein determining the first weight value of the feature extraction sub-network and the second weight value of the background feature enhancement sub-network based on the current number of exercises and a preset total number of exercises comprises:

4. The method of claim 1, wherein training the feature extraction sub-network and the background feature enhancement sub-network in the road attribute identification model based on the sample prediction result and the label information corresponding to the first sample image comprises:

5. The method of claim 1, wherein the method further comprises:

6. The method of claim 1, wherein the method further comprises:

7. A road disease identification device, characterized in that the device comprises:

the identification module is used for identifying the road fusion feature map to obtain a road attribute identification result of the first image;

the apparatus further comprises:

the training module is used for selecting a first sample image of each scene in the sample set to be input into a feature extraction sub-network in the road attribute identification model, and selecting a second sample image of each scene to be input into a background feature enhancement sub-network in the road attribute identification model; performing feature extraction on the first sample image based on the feature extraction sub-network to obtain a first sample feature map, and performing feature extraction on the second sample image based on the background feature enhancement sub-network to obtain a second sample feature map; performing fusion processing on the first sample feature map and the second sample feature map to obtain a third sample feature map; determining a sample prediction result based on the third sample feature map, and training a feature extraction sub-network and the background feature enhancement sub-network in the road attribute identification model according to the sample prediction result and label information corresponding to the first sample image;

The training module is specifically used for acquiring the current training times, and determining a first weight value of the feature extraction sub-network and a second weight value of the background feature enhancement sub-network according to the current training times and the preset total training times; the smaller the current training times are, the larger the first weight value is; and carrying out fusion processing on the first sample feature map and the second sample feature map according to the first weight value and the second weight value to obtain a third sample feature map.

8. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-6 when executing a program stored on a memory.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.