CN111178181A

CN111178181A - Traffic scene segmentation method and related device

Info

Publication number: CN111178181A
Application number: CN201911295968.0A
Authority: CN
Inventors: 施欣欣; 禹世杰; 范艳
Original assignee: SHENZHEN HARZONE TECHNOLOGY CO LTD
Current assignee: SHENZHEN HARZONE TECHNOLOGY CO LTD
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2020-05-19
Anticipated expiration: 2039-12-16
Also published as: CN111178181B

Abstract

The embodiment of the application discloses a traffic scene segmentation method and a related device, wherein the method comprises the following steps: inputting an original image into a backbone network to obtain a fusion characteristic diagram; inputting the fusion feature map into a segmentation network to obtain a segmentation feature map; inputting the fusion characteristic diagram into a shape network to obtain a first shape characteristic diagram, and optimizing the first shape characteristic diagram to obtain a second shape characteristic diagram; merging the segmentation feature map and the second shape feature map to obtain a merged feature network; performing first convolution operation on the merged feature network to obtain an intermediate operation result; decoding the first saliency characteristic map to obtain a first saliency characteristic map, and optimizing the first saliency characteristic map to obtain a second saliency characteristic map; determining a suggested frame with the direction according to the second saliency characteristic map; carrying out second convolution operation and ASPP operation on the directional suggestion frame to obtain a score feature map; and optimizing the score characteristic graph to obtain a segmentation result image. By adopting the embodiment of the application, the traffic scene segmentation precision can be improved.

Description

Traffic scene segmentation method and related device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a traffic scene segmentation method and a related device.

Background

With the development of society, the number of urban automobiles is continuously increased, and the traffic and environmental problems caused by the urban automobiles are gradually increased. To cope with these problems, intelligent transportation systems have become a key research object in urban development. In the intelligent transportation system, the retrieval of vehicles has been proved to be a key technology, and has unique advantages in handling the transportation problems including fake-licensed vehicles, intentionally shielding license plates, tracking hit-and-run and the like, and has important significance for the construction of the intelligent transportation system, so that the problem of how to improve the segmentation precision of the transportation scene needs to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a traffic scene segmentation method and a related device, which can improve the segmentation precision of traffic scenes.

In a first aspect, an embodiment of the present application provides a traffic scene segmentation method, which is applied to an electronic device, and the method includes:

acquiring an original image, a shape mark image and a point category mark image of the original image, wherein the original image is an image comprising a target, the shape mark image is an edge contour image of the target, and the point category mark image is a region image of the target;

inputting the original image into a backbone network to obtain a fusion characteristic diagram;

inputting the fusion feature map into a segmentation network to obtain a segmentation feature map;

inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape tag map to obtain a second shape feature map;

merging the segmentation feature map and the second shape feature map to obtain a merged feature network;

performing a first convolution operation on the merged feature network to obtain an intermediate operation result;

decoding the intermediate operation result to obtain a first significant feature map, and optimizing the first significant feature map through a binary map corresponding to the first significant feature map to obtain a second significant feature map;

determining a directional suggestion box according to the second saliency feature map;

performing second convolution operation and ASPP operation on the directional suggestion frame to obtain a score feature map;

and optimizing the score characteristic graph according to the point category label graph to obtain a segmentation result image.

In a second aspect, an embodiment of the present application provides a traffic scene segmentation apparatus, which is applied to an electronic device, and the apparatus includes: an acquisition unit, an input unit, a merging unit, an arithmetic unit, a decoding unit, a determination unit and an optimization unit, wherein,

the acquiring unit is used for acquiring an original image, a shape mark map and a point category mark map of the original image, wherein the original image is an image comprising a target, the shape mark map is an edge contour image of the target, and the point category mark map is an area image of the target;

the input unit is used for inputting the original image into a backbone network to obtain a fusion characteristic diagram; inputting the fusion feature map into a segmentation network to obtain a segmentation feature map; inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape tag map to obtain a second shape feature map;

the merging unit is configured to merge the segmented feature map and the second shape feature map to obtain a merged feature network;

the operation unit is used for performing a first convolution operation on the merged feature network to obtain an intermediate operation result;

the decoding unit is configured to perform decoding operation on the intermediate operation result to obtain a first significant feature map, and perform optimization processing on the first significant feature map through a binary map corresponding to the first significant feature map to obtain a second significant feature map;

the determining unit is used for determining a suggested frame with the direction according to the second saliency characteristic map;

the operation unit is further specifically configured to perform a second convolution operation and an ASPP operation on the directional suggestion frame to obtain a score feature map;

and the optimization unit is used for optimizing the score feature map according to the point category label map to obtain a segmentation result image.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing the steps in the first aspect of the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program enables a computer to perform some or all of the steps described in the first aspect of the embodiment of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

The embodiment of the application has the following beneficial effects:

it can be seen that the traffic scene segmentation method and the related device described in the embodiments of the present application are applied to an electronic device, and are configured to obtain an original image, a shape tag map and a point category tag map of the original image, where the original image is an image including a target, the shape tag map is an edge contour image of the target, and the point category tag map is an area image of the target, input the original image into a backbone network to obtain a fused feature map, input the fused feature map into a segmentation network to obtain a segmentation feature map, input the fused feature map into a shape network to obtain a first shape feature map, optimize the first shape feature map through the shape tag map to obtain a second shape feature map, merge the segmentation feature map and the second shape feature map to obtain a merged feature network, perform a first convolution operation on the merged feature network to obtain an intermediate operation result, decoding the intermediate operation result to obtain a first significant feature map, optimizing the first significant feature map through a binary map corresponding to the first significant feature map to obtain a second significant feature map, determining a directional suggestion frame according to the second significant feature map, performing second convolution operation and ASPP operation on the directional suggestion frame according to the point category label map to obtain a score feature map, optimizing the score feature map according to the point category label map to obtain a segmentation result image.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1A is a schematic flow chart of a traffic scene segmentation method according to an embodiment of the present application;

FIG. 1B is a schematic diagram illustrating an embodiment of the present application for implementing target enhancement based on temporal context target information of a backbone network;

FIG. 1C is a schematic diagram illustrating an implementation of target enhancement based on spatial context target information of a backbone network according to an embodiment of the present application;

fig. 1D is a schematic diagram illustrating a network optimization function according to an embodiment of the present application;

fig. 1E is a schematic flow chart of another traffic scene segmentation method provided in the embodiment of the present application;

fig. 2 is a schematic flow chart of another traffic scene segmentation method provided in the embodiment of the present application;

fig. 3 is a schematic structural diagram of another electronic device provided in an embodiment of the present application;

fig. 4 is a functional unit composition block diagram of a traffic scene segmentation apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The electronic device described in the embodiment of the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a vehicle data recorder, a traffic guidance platform, a server, a notebook computer, a Mobile Internet device (MID, Mobile Internet Devices), or a wearable device (e.g., a smart watch, a bluetooth headset), which are merely examples, but are not exhaustive, and include but are not limited to the above electronic device, and the electronic device may also be a video matrix, which is not limited herein.

The following describes embodiments of the present application in detail.

In a traffic scene, various marked lines such as white lines, yellow lines, straight arrows, left-turn arrows and the like, signal lights, multi-angle vehicles, pedestrians and other targets exist. The shapes of the marked lines and the signal lamps are very clear, and the marked shape graph can be used for optimizing segmentation.

In the related art, the segmentation algorithm can generate the situations of incomplete shapes and intermittent and continuous marked lines, the network is unsupervised to learn, and in order to solve the problem, a shape graph is provided to supervise and guide the learning network and optimize the target segmentation algorithm.

Based on this, the embodiment of the present application provides a traffic scene segmentation method, which is applied to an electronic device, and includes the following steps:

Wherein, the original image, the label graph (ground route); each original image has two label maps, each point class label, an object bounding box, namely a point class label map and an object edge label map, and a shape label map. The original image is input into a main network, the obtained feature graph is respectively input into a semantic segmentation branch layer and a shape branch layer, and the shape branch layer supervises and guides the network to actively learn the shape feature. The shape branch layer can output a shape characteristic diagram through a series of convolution layers, a batch normalization layer and an up-sampling layer, the loss of the shape characteristic diagram and the loss of a target edge label diagram are calculated, and the loss is optimized to supervise and guide a network to actively learn the shape characteristics. The feature maps obtained by the two branch networks are fused together through the concat layer and input into the decoding layer, and the decoding layer learns to obtain the significant feature map. The binary image is generated by the target enclosure frame, and a supervised learning method is adopted to guide the network learning process. And calculating the loss of the binary image and the significant characteristic image, and intensively guiding the network to learn the target point characteristics.

The following describes embodiments of the present application in detail.

Referring to fig. 1A, fig. 1A is a schematic flow chart of a traffic scene segmentation method provided in an embodiment of the present application, and is applied to an electronic device, where as shown in the figure, the traffic scene segmentation method includes:

101. the method comprises the steps of obtaining an original image, a shape mark map and a point category mark map of the original image, wherein the original image is an image comprising a target, the shape mark map is an edge contour image of the target, and the point category mark map is a region image of the target.

The original image may be any traffic scene image, or the original image may be an image including only an object, which may be a pedestrian or a vehicle. For example, the original image may be a pedestrian image and a vehicle image. The shape label graph is an edge contour image of the target, and the point type label graph can be understood as a target area image.

In one possible example, when the original image is a vehicle image, the step 101 of acquiring the original image includes the following steps:

11. acquiring target environment parameters;

12. determining target shooting parameters corresponding to the target environment parameters according to a mapping relation between preset environment parameters and the shooting parameters;

13. shooting a target vehicle according to the target shooting parameters to obtain a first image;

14. and carrying out image segmentation on the first image to obtain the original image.

In this embodiment, the environmental parameter may be at least one of the following: the ambient light brightness, weather, temperature, humidity, geographical location, magnetic field interference intensity, and the like, which are not limited herein, the shooting parameter may be at least one of the following: sensitivity ISO, exposure time, white balance parameters, shooting mode, color temperature, and the like, which are not limited herein. Wherein, the environmental parameter can be gathered by environmental sensor, and environmental sensor can be following at least one: an ambient light sensor, a weather sensor, a temperature sensor, a humidity sensor, a positioning sensor, a magnetic field detection sensor, and the like, without limitation. The electronic device may store a mapping relationship between preset environmental parameters and shooting parameters in advance.

In the specific implementation, the electronic device may obtain the target environment parameters, determine the target shooting parameters corresponding to the target environment parameters according to a mapping relationship between preset environment parameters and the shooting parameters, further, shoot the target vehicle according to the target shooting parameters to obtain a first image, and perform image segmentation on the first image to obtain an image of the target vehicle.

Between the above step 13 and step 14, the following steps may be further included:

a1, determining the image quality evaluation value of the first image;

a2, when the image quality evaluation value is lower than a preset threshold value, performing image enhancement processing on the first image;

step 14, performing image segmentation on the first image to obtain the target vehicle image, specifically:

and carrying out image segmentation on the first image subjected to image enhancement processing to obtain a target vehicle area, and taking an image corresponding to the target vehicle area as the original image.

In a specific implementation, at least one image quality evaluation index may be used to perform image quality evaluation on an image, where the image quality evaluation index may be at least one of the following: average brightness, sharpness, entropy, etc., without limitation. The image enhancement algorithm may be at least one of: wavelet transformation, image sharpening, gray stretching, histogram equalization, and the like, which are not limited herein.

In a specific implementation, the electronic device may determine an image quality evaluation value of the first image, perform image enhancement processing on the first image and perform image segmentation on the first image after the image enhancement processing when the image quality evaluation value is lower than a preset threshold value to obtain a target vehicle image, and on the contrary, directly perform image segmentation on the first image to obtain the target vehicle image when the image quality evaluation value is greater than or equal to the preset threshold value.

Further, in a possible example, the step a2, performing the image enhancement processing on the first image, may include the following steps:

a21, dividing the first image into a plurality of areas;

a22, determining the definition value of each area in the plurality of areas to obtain a plurality of definition values;

a23, selecting a definition value lower than a preset definition value from the definition values, and acquiring a corresponding area to obtain at least one target area;

a24, determining the distribution density of the characteristic points corresponding to each area in the at least one target area to obtain at least one distribution density of the characteristic points;

a25, determining a characteristic point distribution density grade corresponding to the at least one characteristic point distribution density to obtain at least one characteristic point density distribution grade;

a26, determining a target image enhancement algorithm corresponding to the at least one characteristic point density distribution grade according to a mapping relation between preset characteristic point distribution density grades and image enhancement algorithms;

and A27, performing image enhancement processing on the corresponding target area according to a target image enhancement algorithm corresponding to the density distribution grade of the at least one characteristic point to obtain the first image after the image enhancement processing.

The preset definition value can be set by a user or defaulted by a system. The electronic device may pre-store a mapping relationship between a preset feature point distribution density level and an image enhancement algorithm, where the image enhancement algorithm may be at least one of: wavelet transformation, image sharpening, gray stretching, histogram equalization, and the like, which are not limited herein.

In specific implementation, the electronic device may divide the first image into a plurality of regions, each region has the same or different area, may further determine a sharpness value of each of the plurality of regions, obtain a plurality of sharpness values, select a sharpness value lower than a preset sharpness value from the plurality of sharpness values, and obtain a region corresponding to the sharpness value, obtain at least one target region, and further determine a feature point distribution density corresponding to each region in the at least one target region, obtain at least one feature point distribution density, where each region corresponds to one feature point distribution density, and the feature point distribution density is the total number of feature points/region area of one region. The electronic device may further pre-store a mapping relationship between the determined feature point distribution density and the feature point distribution density level, and further determine, according to the mapping relationship, a feature point distribution density level corresponding to each feature point distribution density in the at least one feature point distribution density to obtain the at least one feature point density distribution level.

Further, the electronic device may determine, according to a mapping relationship between preset feature point distribution density levels and an image enhancement algorithm, a target image enhancement algorithm corresponding to each feature point distribution density level in at least one feature point density distribution level, and perform image enhancement processing on a corresponding target region according to the target image enhancement algorithm corresponding to the at least one feature point density distribution level, to obtain a first image after the image enhancement processing, so that a region with good image quality may be prevented from being over-enhanced, and regions with different image qualities may have different image qualities, so that image enhancement may be performed in a targeted manner, which is more beneficial to improving the image quality.

102. And inputting the original image into a backbone network to obtain a fusion characteristic diagram.

The backbone network may be a Resnet network, a desnet network, a mobilene network, or other backbone networks. In specific implementation, the electronic device may input the original image to the backbone network to obtain the fused feature map.

In one possible example, the step 102 of inputting the original image into the backbone network to obtain the fused feature map may include the following steps:

21. acquiring at least one shot image adjacent to the shooting time of the original image;

22. inputting the original image and the at least one shot image into the backbone network to obtain a first fusion characteristic;

23. determining at least one scale transformation image corresponding to the original image, wherein scales of different scale transformation images are different;

24. inputting the original image and the at least one scale image into the backbone network to obtain a second fusion characteristic;

25. and fusing the first fusion characteristic and the second fusion characteristic to obtain the fusion characteristic graph.

The electronic device may acquire at least one captured image adjacent to a capturing time of the original image, for example, take three consecutive frames of images in a video, and then input the original image and the at least one captured image to the backbone network to obtain the first fusion feature. As shown in fig. 1B, an original image and two adjacent frames of images of the original image are obtained and input to a backbone network to obtain a plurality of target area feature maps, each input image corresponds to one target area feature map, and a first fusion sum feature map is formed by the plurality of target area feature maps.

Further, the electronic device may further determine at least one scale-transformed image corresponding to the original image, where scales of different scale-transformed images are different, and specifically, may be implemented by multi-scale decomposition, scale scaling, or down-sampling to obtain at least one scale-transformed image, where scales of different scale-transformed images are different, and further, may input the original image and the at least one scale-transformed image into the backbone network to obtain the second fusion feature, as shown in fig. 1C, for example, the electronic device may perform scale transformation on the original image to obtain at least one scale-transformed image, further, both the original image and the at least one scale-transformed image are input into the backbone network, the original image obtains the target feature frame, and the at least one scale-transformed image may obtain features around the target frame, further, the features are fused, and obtaining a second fusion image, namely the Gaussian fusion characteristic. The spatial context information indicates that different targets have a certain relationship in a traffic scene, and the targets also have a certain relationship with the surrounding environment. During training, a certain scale can be expanded to the periphery according to the target surrounding frame to form a group of target frames, the target characteristics and the regional characteristics of the expanded target frames are extracted, and the characteristics are combined together according to a Gaussian filtering mode.

Further, the electronic device may fuse the first fusion feature and the second fusion feature to obtain a fusion feature map.

In one possible example, in the step 25, the fusing the first fused feature and the second fused feature to obtain the fused feature map may include the following steps:

251. connecting the first fusion image and the second fusion image in series to obtain a series image;

252. and carrying out convolution operation on the serial images to obtain the fusion characteristic diagram.

In specific implementation, the electronic device may connect the first fusion image and the second fusion image in series to obtain a series image, and may further perform convolution operation on the series image to obtain a fusion feature map. That is, 2 sets of feature maps are connected in series, and the number of channels is reduced by one convolution layer to obtain a fused feature map.

103. And inputting the fusion feature map into a segmentation network to obtain a segmentation feature map.

The segmentation network is a network capable of realizing image segmentation, and for example, the segmentation network may be a semantic segmentation network.

104. And inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape marking map to obtain a second shape feature map.

In specific implementation, the electronic device may input the fusion feature map into the shape network to obtain a first shape feature map, and optimize the first shape feature map through the shape label map to obtain a second shape feature map, so that the edge segmentation accuracy may be improved.

In one possible example, the step 104 of optimizing the first shape feature map by the shape label map to obtain the second shape feature map may include the following steps:

41. determining a shape loss between the shape marker map and the first shape feature map;

42. optimizing model parameters of the shape network by the shape loss;

43. and calculating the first shape feature diagram through the optimized shape network to obtain the second shape feature diagram.

In the embodiment of the present application, the model parameter may be at least one of: weight, bias, convolution kernel, number of layers, activation function type, metric, weight optimization algorithm, batch _ size, etc., without limitation. In a specific implementation, the electronic device may determine a shape loss between the shape label graph and the first shape feature graph, that is, may implement a difference between the two, and further may optimize a model parameter of the shape network through the shape loss, for example, may perform an operation on the first shape feature graph through the optimized shape network to obtain the second shape feature graph. As shown in fig. 1D, the electronic device may perform an operation on input data of the original model parameter based on the shape network, optimize an operation result, that is, determine a shape loss between the operation result and the shape label graph, input the shape loss into the loss function to obtain an update parameter, and optimize the model parameter of the shape network using the update parameter.

105. And merging the segmentation feature map and the second shape feature map to obtain a merged feature network.

In a specific implementation, the electronic device may merge the segmented feature map and the second shape feature map through a connection layer concat to obtain a merged feature network.

106. And carrying out first convolution operation on the merged feature network to obtain an intermediate operation result.

In specific implementation, the electronic device can perform a first convolution operation on the merged feature network, so that the significance of the merged feature network can be improved, and an intermediate operation result is obtained.

107. And decoding the intermediate operation result to obtain a first significant feature map, and optimizing the first significant feature map through a binary map corresponding to the first significant feature map to obtain a second significant feature map.

The electronic device may input the intermediate operation result to the decoding layer to perform decoding operation, so as to obtain a first saliency feature map, and perform optimization processing on the first saliency feature map through a binary map corresponding to the first saliency feature map, so as to obtain a second saliency feature map. The electronic device may obtain a binary image by defining the target in the first saliency feature map as 255 and defining the background as 0, and perform optimization processing on the first saliency feature map by using the binary image and a loss function corresponding to the first saliency feature map to obtain a second saliency feature map.

In a possible example, in step 107, performing optimization processing on the first significant feature map through the binary map corresponding to the first significant feature map to obtain a second significant feature map, the method may include the following steps:

71. determining a target amount of loss between the first saliency map and the binary map;

72. optimizing model parameters of the middle layer according to the target loss amount;

73. and calculating the first significant characteristic diagram through the optimized intermediate layer to obtain the second significant characteristic diagram.

In the embodiment of the present application, the model parameter may be at least one of: weight, bias, convolution kernel, number of layers, activation function type, metric, weight optimization algorithm, batch _ size, etc., without limitation. In specific implementation, the electronic device determines a target loss amount between the first saliency feature map and the binary map, optimizes a model parameter of the intermediate layer through the target loss amount, and calculates the first saliency feature map through the optimized intermediate layer to obtain a second saliency feature map, and a specific principle may refer to fig. 1D.

108. And determining a directional suggestion frame according to the second significant feature map and the point class mark map.

In a specific implementation, the electronic device may determine the directional suggestion box according to the second saliency feature map and the point class label map. The second saliency feature map corresponds to a prediction box, the point class label map corresponds to a target box, as shown in fig. 1E, two branches of the middle layer point to saliency, one points to a box, the other points to a branch of the box, and the last layer is a convolution layer, and the convolution layer output layer is the number of channels × 4, that is, 4 points are suggested boxes with directions.

In one possible example, the step 108 of determining the band direction suggestion box according to the second saliency feature map may include the following steps:

81. determining the second significant feature map corresponding to the position of a target, wherein the target is any one of the point class label maps;

82. judging whether overlap exists between a rectangle i and a rectangle j, wherein the rectangle i is any frame corresponding to the target, and the rectangle j is any rectangle in a second significance characteristic diagram;

83. when the rectangle i and the rectangle j are overlapped, determining the number of points in the overlapped area;

84. determining a ratio between the points and the total points of the second significant feature map, and determining a sample attribute of the overlapping area according to the ratio;

85. and training the overlapping area to obtain the suggested frame with the direction.

In a specific implementation, the sample attribute may be a positive sample or a negative sample, the target corresponds to the saliency feature map one by one, and then, the electronic device may determine a second saliency feature map corresponding to the position of the target, where the target is any one of the targets in the point category label map, and further, determine whether there is an overlap between a rectangle i and a rectangle j, where the rectangle i is any one of the frames corresponding to the target and the rectangle j is any one of the rectangles in the second saliency feature map, when there is an overlap between the rectangle i and the rectangle j, determine the number of points in the overlap region, determine a ratio between the number of points and the total number of points of the second saliency feature map, determine the sample attribute of the overlap region according to the ratio, train the overlap region, and obtain the direction suggestion frame.

109. And carrying out second convolution operation and ASPP operation on the directional suggestion frame to obtain a score feature map.

Among them, the image space Pyramid structure with cavity convolution (ASPP). The electronic equipment can perform second convolution operation and ASPP operation according to the directional suggestion frame to obtain a score feature map, and feature significance can be improved.

110. And optimizing the score characteristic graph according to the point category label graph to obtain a segmentation result image.

Wherein the segmentation result image may include at least one of: white lines, yellow lines, straight arrows, left-turn arrows, vehicles, pedestrians, or others. The electronic equipment can optimize the score characteristic graph according to the point type label graph to obtain a segmentation result image.

In an exemplary embodiment of the present application, the loss function may be at least one of: hinge loss function, cross entropy loss function, exponential loss function, and the like, without limitation.

In one possible example, the step 110 of optimizing the scoring profile according to the point category label graph may include the following steps:

1101. determining a point loss between the scored feature map and the point category label map;

1102. optimizing model parameters of the ASPP by the point loss;

1103. and calculating the score feature map through the optimized ASPP to obtain the image segmentation result.

In the embodiment of the present application, the model parameter may be at least one of: weight, bias, convolution kernel, number of layers, activation function type, metric, weight optimization algorithm, batch _ size, etc., without limitation. In specific implementation, the electronic device may determine a point loss between the score feature map and the point classification label map, optimize a model parameter of the ASPP through the point loss, and perform an operation on the score feature map through the optimized ASPP to obtain an image segmentation result, and the specific principle may refer to fig. 1D.

Specifically, as shown in fig. 1E, the original image is input to the backbone network, the obtained feature maps are respectively input to the semantic segmentation branch layer and the shape branch layer, and the shape branch layer supervises and guides the network to actively learn the shape features. The shape branch layer outputs a shape characteristic diagram through a series of convolution layers, batch normalization layers and up-sampling layers, the loss of the shape characteristic diagram and the loss of a target edge label diagram are calculated, and the loss is optimized to supervise and guide a network to actively learn the shape characteristics. The feature maps obtained by the two branch networks are fused together through the concat layer and input into the decoding layer, and the decoding layer learns to obtain the significant feature map. The binary image is generated by the target enclosure frame, and a supervised learning method is adopted to guide the network learning process. And calculating the loss of the binary image and the significant characteristic image, and intensively guiding the network to learn the target point characteristics.

It can be seen that the traffic scene segmentation method described in the embodiments of the present application is applied to an electronic device, and obtains an original image, a shape tag map and a point category tag map of the original image, where the original image is an image including a target, the shape tag map is an edge contour image of the target, and the point category tag map is an area image of the target, the original image is input to a backbone network to obtain a fusion feature map, the fusion feature map is input to a segmentation network to obtain a segmentation feature map, the fusion feature map is input to a shape network to obtain a first shape feature map, the first shape feature map is optimized by the shape tag map to obtain a second shape feature map, the segmentation feature map and the second shape feature map are merged to obtain a merged feature network, the merged feature network is subjected to a first convolution operation to obtain an intermediate operation result, and the intermediate operation result is subjected to a decoding operation, the method comprises the steps of obtaining a first significant feature map, optimizing the first significant feature map through a binary map corresponding to the first significant feature map to obtain a second significant feature map, determining a directional suggestion frame according to the second significant feature map, performing second convolution operation and ASPP operation on the directional suggestion frame to obtain a score feature map, optimizing the score feature map according to a point category label map to obtain a segmentation result image.

Referring to fig. 2, fig. 2 is a schematic flow chart of a traffic scene segmentation method provided in an embodiment of the present application, and the traffic scene segmentation method is applied to an electronic device, as shown in the figure, the traffic scene segmentation method includes:

201. and acquiring an image to be processed.

202. And carrying out image segmentation on the image to be processed to obtain a target area image, and taking the image with the preset size including the target area image as an original image.

203. And acquiring a shape mark map and a point category mark map of the original image, wherein the shape mark map is an edge contour image of the target, and the point category mark map is a region image of the target.

204. And inputting the original image into a backbone network to obtain a fusion characteristic diagram.

205. And inputting the fusion feature map into a segmentation network to obtain a segmentation feature map.

206. And inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape marking map to obtain a second shape feature map.

207. And merging the segmentation feature map and the second shape feature map to obtain a merged feature network.

208. And carrying out first convolution operation on the merged feature network to obtain an intermediate operation result.

209. And decoding the intermediate operation result to obtain a first significant feature map, and optimizing the first significant feature map through a binary map corresponding to the first significant feature map to obtain a second significant feature map.

210. And determining a belt direction suggestion box according to the second saliency characteristic map.

211. And carrying out second convolution operation and ASPP operation on the directional suggestion frame to obtain a score feature map.

212. And optimizing the score characteristic graph according to the point category label graph to obtain a segmentation result image.

The preset size can be set by the user or default by the system.

For the detailed description of the steps 201 to 212, reference may be made to the corresponding steps of the traffic scene segmentation method described in the foregoing fig. 1A, and details are not repeated here.

The traffic scene segmentation method described in the embodiment of the application can fully utilize the shape tag map and the point category tag map of the original image to guide network learning on the one hand, and can improve the feature significance and realize accurate target positioning by combining the suggested direction frame on the other hand, thereby being beneficial to improving the image segmentation precision.

In accordance with the foregoing embodiments, please refer to fig. 3, where fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, and as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, which are applied to the electronic device, the one or more programs are stored in the memory and configured to be executed by the processor, and in an embodiment of the present application, the programs include instructions for performing the following steps:

It can be seen that, in the electronic device described in this embodiment of the present application, an original image, a shape tag map and a point category tag map of the original image are obtained, the original image is an image including a target, the shape tag map is an edge contour image of the target, the point category tag map is an area image of the target, the original image is input to a backbone network to obtain a fused feature map, the fused feature map is input to a segmentation network to obtain a segmentation feature map, the fused feature map is input to a shape network to obtain a first shape feature map, the first shape feature map is optimized through the shape tag map to obtain a second shape feature map, the segmentation feature map and the second shape feature map are merged to obtain a merged feature network, the merged feature network is subjected to a first convolution operation to obtain an intermediate operation result, the intermediate operation result is subjected to a decoding operation to obtain a first saliency feature map, and optimizing the first significant feature map through a binary map corresponding to the first significant feature map to obtain a second significant feature map, determining a directional suggestion frame according to the second significant feature map, performing second convolution operation and ASPP operation on the directional suggestion frame to obtain a score feature map, and optimizing the score feature map according to the point category tag map to obtain a segmentation result image.

In one possible example, in said determining a band direction suggestion box from said second saliency map, the above procedure comprises instructions for:

determining the second significant feature map corresponding to the position of a target, wherein the target is any one of the point class label maps;

judging whether overlap exists between a rectangle i and a rectangle j, wherein the rectangle i is any frame corresponding to the target, and the rectangle j is any rectangle in a second significance characteristic diagram;

when the rectangle i and the rectangle j are overlapped, determining the number of points in the overlapped area;

determining a ratio between the points and the total points of the second significant feature map, and determining a sample attribute of the overlapping area according to the ratio;

and training the overlapping area to obtain the suggested frame with the direction.

In one possible example, in the aspect of inputting the original image into the backbone network to obtain the fused feature map, the program includes instructions for performing the following steps:

acquiring at least one shot image adjacent to the shooting time of the original image;

inputting the original image and the at least one shot image into the backbone network to obtain a first fusion characteristic;

determining at least one scale transformation image corresponding to the original image, wherein scales of different scale transformation images are different;

inputting the original image and the at least one scale conversion image into the backbone network to obtain a second fusion characteristic;

and fusing the first fusion characteristic and the second fusion characteristic to obtain the fusion characteristic graph.

In one possible example, in the fusing the first fused feature and the second fused feature to obtain the fused feature map, the program includes instructions for:

connecting the first fusion image and the second fusion image in series to obtain a series image;

and carrying out convolution operation on the serial images to obtain the fusion characteristic diagram.

In one possible example, in the optimizing the first shape feature map by the shape marker map to obtain a second shape feature map, the program includes instructions for:

determining a shape loss between the shape marker map and the first shape feature map;

optimizing model parameters of the shape network by the shape loss;

and calculating the first shape feature diagram through the optimized shape network to obtain the second shape feature diagram.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Fig. 4 is a block diagram of functional units of a traffic scene segmentation apparatus 400 according to an embodiment of the present application. The traffic scene segmentation apparatus 400 is applied to an electronic device, and the apparatus includes: an acquisition unit 401, an input unit 402, a merging unit 403, an arithmetic unit 404, a decoding unit 405, a determination unit 406, and an optimization unit 407, wherein,

the acquiring unit 401 is configured to acquire an original image, a shape tag map and a point category tag map of the original image, where the original image is an image including a target, the shape tag map is an edge contour image of the target, and the point category tag map is an area image of the target;

the input unit 402 is configured to input the original image to a backbone network to obtain a fusion feature map; inputting the fusion feature map into a segmentation network to obtain a segmentation feature map; inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape tag map to obtain a second shape feature map;

the merging unit 403 is configured to merge the segmented feature map and the second shape feature map to obtain a merged feature network;

the operation unit 404 is configured to perform a first convolution operation on the merged feature network to obtain an intermediate operation result;

the decoding unit 405 is configured to perform a decoding operation on the intermediate operation result to obtain a first significant feature map, and perform optimization processing on the first significant feature map through a binary map corresponding to the first significant feature map to obtain a second significant feature map;

the determining unit 406 is configured to determine a suggested box with a direction according to the second significant feature map;

the operation unit 404 is further specifically configured to perform a second convolution operation and an ASPP operation on the directional suggestion frame to obtain a score feature map;

the optimizing unit 407 is configured to optimize the score feature map according to the point category label map to obtain a segmentation result image.

It can be seen that the traffic scene segmentation apparatus described in the embodiments of the present application is applied to an electronic device, and obtains an original image, a shape tag map and a point category tag map of the original image, where the original image is an image including a target, the shape tag map is an edge contour image of the target, and the point category tag map is an area image of the target, the original image is input to a backbone network to obtain a fused feature map, the fused feature map is input to a segmentation network to obtain a segmented feature map, the fused feature map is input to a shape network to obtain a first shape feature map, the first shape feature map is optimized by the shape tag map to obtain a second shape feature map, the segmented feature map and the second shape feature map are merged to obtain a merged feature network, the merged feature network is subjected to a first convolution operation to obtain an intermediate operation result, and the intermediate operation result is subjected to a decoding operation, the method comprises the steps of obtaining a first significant feature map, optimizing the first significant feature map through a binary map corresponding to the first significant feature map to obtain a second significant feature map, determining a directional suggestion frame according to the second significant feature map, performing second convolution operation and ASPP operation on the directional suggestion frame to obtain a score feature map, optimizing the score feature map according to a point category label map to obtain a segmentation result image.

In one possible example, in terms of the determining a band direction suggestion box according to the second significant feature map, the determining unit 406 is specifically configured to:

In a possible example, in the aspect of inputting the original image into the backbone network to obtain the fused feature map, the input unit 402 is specifically configured to:

In a possible example, in terms of the fusion of the first fusion feature and the second fusion feature to obtain the fusion feature map, the merging unit 403 is specifically configured to:

In one possible example, in terms of optimizing the first shape feature map by the shape label map to obtain a second shape feature map, the input unit 402 is specifically configured to:

optimizing model parameters of the shape network by the shape loss;

It can be understood that the functions of each program module of the traffic scene segmentation apparatus in this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.

Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A traffic scene segmentation method is applied to electronic equipment, and comprises the following steps:

2. The method of claim 1, wherein determining a band direction suggestion box from the second saliency feature map comprises:

3. The method according to claim 1 or 2, wherein the inputting the original image into a backbone network to obtain a fused feature map comprises:

4. The method according to claim 3, wherein said fusing the first fused feature and the second fused feature to obtain the fused feature map comprises:

5. The method according to any of claims 1-4, wherein said optimizing said first shape feature map by said shape marker map to obtain a second shape feature map comprises:

optimizing model parameters of the shape network by the shape loss;

6. A traffic scene segmentation device is applied to electronic equipment, and comprises: an acquisition unit, an input unit, a merging unit, an arithmetic unit, a decoding unit, a determination unit and an optimization unit, wherein,

7. The apparatus according to claim 6, wherein, in said determining a band direction suggestion box from the second saliency feature map, the determining unit is specifically configured to:

8. The apparatus according to claim 6 or 7, wherein in the aspect of inputting the original image into the backbone network to obtain the fused feature map, the input unit is specifically configured to:

9. An electronic device comprising a processor, a memory for storing one or more programs and configured for execution by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-5.

10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-5.