CN111178181B

CN111178181B - Traffic scene segmentation method and related device

Info

Publication number: CN111178181B
Application number: CN201911295968.0A
Authority: CN
Inventors: 施欣欣; 禹世杰; 范艳
Original assignee: SHENZHEN HARZONE TECHNOLOGY CO LTD
Current assignee: SHENZHEN HARZONE TECHNOLOGY CO LTD
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2023-06-09
Anticipated expiration: 2039-12-16
Also published as: CN111178181A

Abstract

The embodiment of the application discloses a traffic scene segmentation method and a related device, wherein the method comprises the following steps: inputting the original image into a backbone network to obtain a fusion feature map; inputting the fusion feature map into a segmentation network to obtain a segmentation feature map; inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map to obtain a second shape feature map; merging the segmentation feature map and the second shape feature map to obtain a merged feature network; performing a first convolution operation on the combined characteristic network to obtain an intermediate operation result; performing decoding operation to obtain a first saliency feature map, and performing optimization treatment to obtain a second saliency feature map; determining a tape direction suggestion frame according to the second saliency feature map; carrying out second convolution operation and ASPP operation on the basis of the direction suggestion frame to obtain a score characteristic diagram; and optimizing the score characteristic diagram to obtain a segmentation result image. By adopting the embodiment of the application, the traffic scene segmentation precision can be improved.

Description

Traffic scene segmentation method and related device

Technical Field

The application relates to the technical field of image processing, in particular to a traffic scene segmentation method and a related device.

Background

With the development of society, the number of urban automobiles is continuously increased, and the traffic and environmental problems caused by the increase are also increased. To address these issues, intelligent transportation systems have been the subject of intense research in urban development. In the intelligent traffic system, the retrieval of vehicles has proved to be a key technology, has unique advantages in processing traffic problems including fake license vehicles, deliberately shielding license plates, tracking hit-and-run and the like, has important significance in the construction of the intelligent traffic system, and therefore, the problem of how to improve the segmentation accuracy of traffic scenes is urgently solved.

Disclosure of Invention

The embodiment of the application provides a traffic scene segmentation method and a related device, which can improve the segmentation precision of a traffic scene.

In a first aspect, an embodiment of the present application provides a traffic scene segmentation method, applied to an electronic device, where the method includes:

acquiring an original image, and a shape marker graph and a point class marker graph of the original image, wherein the original image is an image comprising a target, the shape marker graph is an edge contour image of the target, and the point class marker graph is a region image of the target;

Inputting the original image into a backbone network to obtain a fusion feature map;

inputting the fusion feature map to a segmentation network to obtain a segmentation feature map;

inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape mark map to obtain a second shape feature map;

combining the segmentation feature map and the second shape feature map to obtain a combined feature network;

performing a first convolution operation on the combined characteristic network to obtain an intermediate operation result;

decoding the intermediate operation result to obtain a first salient feature map, and optimizing the first salient feature map through a binary map corresponding to the first salient feature map to obtain a second salient feature map;

determining a tape direction suggestion frame according to the second saliency feature map;

carrying out second convolution operation and ASPP operation on the direction suggestion frame to obtain a score characteristic diagram;

and optimizing the score characteristic graph according to the point category label graph to obtain a segmentation result image.

In a second aspect, an embodiment of the present application provides a traffic scene segmentation apparatus, applied to an electronic device, where the apparatus includes: an acquisition unit, an input unit, a merging unit, an operation unit, a decoding unit, a determination unit and an optimization unit, wherein,

The acquisition unit is used for acquiring an original image, and a shape mark graph and a point category mark graph of the original image, wherein the original image is an image comprising a target, the shape mark graph is an edge contour image of the target, and the point category mark graph is a region image of the target;

the input unit is used for inputting the original image into a backbone network to obtain a fusion feature map; inputting the fusion feature map to a segmentation network to obtain a segmentation feature map; inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape mark map to obtain a second shape feature map;

the merging unit is used for merging the segmentation feature map and the second shape feature map to obtain a merged feature network;

the operation unit is used for carrying out first convolution operation on the combined characteristic network to obtain an intermediate operation result;

the decoding unit is used for decoding the intermediate operation result to obtain a first salient feature map, and optimizing the first salient feature map through a binary map corresponding to the first salient feature map to obtain a second salient feature map;

The determining unit is used for determining a band direction suggestion frame according to the second saliency characteristic diagram;

the operation unit is further specifically configured to perform a second convolution operation and ASPP operation on the direction suggestion frame to obtain a score feature map;

and the optimizing unit is used for optimizing the score characteristic graph according to the point category marking graph to obtain a segmentation result image.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps in the first aspect of the embodiment of the present application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program causes a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

By implementing the embodiment of the application, the following beneficial effects are achieved:

it can be seen that, the traffic scene segmentation method and the related device described in the embodiments of the present application are applied to an electronic device, an original image is obtained, and a shape marker graph and a point class marker graph of the original image are obtained, the original image is an image including a target, the shape marker graph is an edge contour image of the target, the point class marker graph is an area image of the target, the original image is input to a backbone network to obtain a fusion feature graph, the fusion feature graph is input to a segmentation network to obtain a segmentation feature graph, the fusion feature graph is input to the shape network to obtain a first shape feature graph, the first shape feature graph is optimized through the shape marker graph to obtain a second shape feature graph, the segmentation feature graph is combined with the second shape feature graph, the segmentation feature graph and the second shape feature graph are combined to obtain a combined feature network, the combined feature graph is subjected to a first convolution operation to obtain an intermediate operation result, the intermediate operation result is decoded to obtain a first saliency feature graph, the first saliency feature graph is obtained through a binary graph corresponding to the first saliency feature graph, the first saliency feature graph is obtained, the second saliency feature graph is determined to have a direction frame according to the second saliency feature graph, the second saliency feature graph is obtained, the direction frame is obtained, the second saliency feature graph is obtained, the contour feature graph is obtained, the map is combined according to the convolution feature graph is obtained, and the classification feature graph is obtained, on the basis of the classification feature graph is fully, on the basis, and the classification feature graph is obtained, and the classification feature graph is fully is obtained, on the basis, and the classification feature graph is obtained, and on the basis, on the basis of the classification feature map.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1A is a schematic flow chart of a traffic scene segmentation method according to an embodiment of the present application;

fig. 1B is a schematic diagram for demonstrating implementation of target enhancement based on time context target information of a backbone network according to an embodiment of the present application;

fig. 1C is a schematic diagram for demonstrating target enhancement based on spatial context target information of a backbone network according to an embodiment of the present application;

fig. 1D is a schematic diagram of implementing a network optimization function according to an embodiment of the present application;

fig. 1E is a flow chart of another traffic scene segmentation method according to an embodiment of the present application;

fig. 2 is a flow chart of another traffic scene segmentation method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of another electronic device according to an embodiment of the present application;

Fig. 4 is a functional unit composition block diagram of a traffic scene segmentation apparatus according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The electronic device described in the embodiments of the present application may include a smart Phone (such as an Android mobile Phone, an iOS mobile Phone, a Windows Phone mobile Phone, etc.), a tablet computer, a palm computer, a vehicle event recorder, a traffic guidance platform, a server, a notebook computer, a mobile internet device (MID, mobile Internet Devices), or a wearable device (such as a smart watch, a bluetooth headset), etc., which are merely examples, but not limited to, the above electronic device may also be a video matrix, and the electronic device is not limited thereto.

The embodiments of the present application are described in detail below.

In traffic scenes, various marked lines such as white lines, yellow lines, straight arrows, left-turn arrows and the like exist, and targets such as signal lamps, multi-angle vehicles, pedestrians and the like. The reticle and signal lamp shapes are very sharp, and the marked shape can be used to optimize segmentation.

In the related art, the segmentation algorithm can generate conditions of incomplete shapes and intermittent marking, and the network is unsupervised learning, so that in order to solve the problem, a shape chart is used for supervising and guiding the learning network, and the target segmentation algorithm is optimized.

Based on this, the embodiment of the application provides a traffic scene segmentation method, which is applied to electronic equipment and comprises the following steps:

Wherein, the original image, the mark graph (group trunk); each original image has two mark graphs, namely a point category mark graph, a target bounding box, a target edge mark graph and a shape mark graph. The original image is input into a backbone network, the obtained feature images are respectively input into a semantic segmentation branch layer and a shape branch layer, and the shape branch layer supervises and guides the network to actively learn shape features. The shape branching layer can output a shape characteristic diagram through a series of convolution layers, a batch normalization layer and an up-sampling layer, calculate the loss of the shape characteristic diagram and the target edge label diagram, and supervise and guide the network to actively learn the shape characteristic through optimizing the loss. Feature images obtained by the two branch networks are fused together through a concat layer, input into a decoding layer and are learned by the decoding layer to obtain a significance feature image. The binary image is generated by a target bounding box, and the process is guided to be studied by the network by adopting a supervised learning method. And calculating the loss of the binary image and the saliency characteristic image, and strengthening and guiding the network to learn the characteristics of the target point.

The embodiments of the present application are described in detail below.

Referring to fig. 1A, fig. 1A is a flow chart of a traffic scene segmentation method provided in an embodiment of the present application, which is applied to an electronic device, as shown in the drawing, and the traffic scene segmentation method includes:

101. the method comprises the steps of obtaining an original image, and a shape mark graph and a point category mark graph of the original image, wherein the original image is an image comprising a target, the shape mark graph is an edge contour image of the target, and the point category mark graph is an area image of the target.

The original image may be any traffic scene image, or the original image may be an image including only a target, which may be a pedestrian or a vehicle. For example, the original image may be a pedestrian image and a vehicle image. The shape mark graph is an edge contour image of the target, and the point category mark graph can be understood as an image of the target area.

In one possible example, when the original image is a vehicle image, the step 101 acquires the original image, including the steps of:

11. acquiring a target environment parameter;

12. determining a target shooting parameter corresponding to the target environmental parameter according to a mapping relation between a preset environmental parameter and the shooting parameter;

13. Shooting a target vehicle according to the target shooting parameters to obtain a first image;

14. and carrying out image segmentation on the first image to obtain the original image.

In this embodiment of the present application, the environmental parameter may be at least one of the following: ambient light, weather, temperature, humidity, geographical location, magnetic field disturbance intensity, etc., without limitation, the shooting parameters may be at least one of the following: the sensitivity ISO, exposure time, white balance parameter, photographing mode, color temperature, and the like are not limited herein. Wherein the environmental parameter may be collected by an environmental sensor, which may be at least one of: ambient light sensors, weather sensors, temperature sensors, humidity sensors, positioning sensors, magnetic field detection sensors, and the like, are not limited herein. The mapping relation between the preset environmental parameters and the shooting parameters can be stored in the electronic equipment in advance.

In a specific implementation, the electronic device may acquire a target environment parameter, and determine a target shooting parameter corresponding to the target environment parameter according to a mapping relationship between the preset environment parameter and the shooting parameter, further, may shoot the target vehicle according to the target shooting parameter to obtain a first image, and perform image segmentation on the first image to obtain a target vehicle image, so that not only a shooting image suitable for the environment may be obtained, but also an image only including the target vehicle may be extracted based on the shooting image to obtain an original image.

Between the above steps 13 to 14, the method may further include the following steps:

a1, determining an image quality evaluation value of the first image;

a2, performing image enhancement processing on the first image when the image quality evaluation value is lower than a preset threshold value;

in the step 14, the image segmentation is performed on the first image to obtain the target vehicle image, specifically:

and carrying out image segmentation on the first image after the image enhancement processing to obtain a target vehicle region, and taking an image corresponding to the target vehicle region as the original image.

In a specific implementation, at least one image quality evaluation index may be used to perform image quality evaluation on the image, where the image quality evaluation index may be at least one of the following: average luminance, sharpness, entropy, etc., are not limited herein. The image enhancement algorithm may be at least one of: wavelet transformation, image sharpening, gray stretching, histogram equalization, etc., are not limited herein.

In a specific implementation, the electronic device may determine an image quality evaluation value of the first image, and when the image quality evaluation value is lower than a preset threshold, perform image enhancement processing on the first image, and perform image segmentation on the first image after the image enhancement processing to obtain a target vehicle image, otherwise, when the image quality evaluation value is greater than or equal to the preset threshold, directly perform image segmentation on the first image to obtain the target vehicle image, so that image segmentation accuracy can be improved, and subsequent detection is facilitated.

Further, in one possible example, the step A2 of performing image enhancement processing on the first image may include the following steps:

a21, dividing the first image into a plurality of areas;

a22, determining a definition value of each region in the plurality of regions to obtain a plurality of definition values;

a23, selecting a definition value lower than a preset definition value from the definition values, and acquiring a corresponding region to obtain at least one target region;

a24, determining the distribution density of the feature points corresponding to each region in the at least one target region to obtain at least one distribution density of the feature points;

a25, determining a characteristic point distribution density grade corresponding to the at least one characteristic point distribution density to obtain at least one characteristic point density distribution grade;

a26, determining a target image enhancement algorithm corresponding to the at least one characteristic point density distribution level according to a mapping relation between the preset characteristic point distribution density level and the image enhancement algorithm;

and A27, performing image enhancement processing on the corresponding target area according to a target image enhancement algorithm corresponding to the at least one characteristic point density distribution level to obtain the first image after the image enhancement processing.

The preset definition value can be set by a user or default by the system. The mapping relation between the preset characteristic point distribution density level and the image enhancement algorithm can be stored in the electronic equipment in advance, and the image enhancement algorithm can be at least one of the following: wavelet transformation, image sharpening, gray stretching, histogram equalization, etc., are not limited herein.

In a specific implementation, the electronic device may divide the first image into a plurality of regions, where each region has the same or different area, and may further determine a sharpness value of each region in the plurality of regions to obtain a plurality of sharpness values, select a sharpness value lower than a preset sharpness value from the plurality of sharpness values, and obtain a region corresponding to the sharpness value to obtain at least one target region, and further determine a feature point distribution density corresponding to each region in the at least one target region to obtain at least one feature point distribution density, where each region corresponds to one feature point distribution density, and feature point distribution density=feature point total number/region area of one region. The electronic device may further store a mapping relationship between the feature point distribution density and the feature point distribution density level in advance, and further determine a feature point distribution density level corresponding to each feature point distribution density in the at least one feature point distribution density according to the mapping relationship, so as to obtain the at least one feature point distribution density level.

Further, the electronic device may determine a target image enhancement algorithm corresponding to each feature point distribution density level in the at least one feature point distribution density level according to a mapping relationship between the preset feature point distribution density level and the image enhancement algorithm, and perform image enhancement processing on a corresponding target area according to the target image enhancement algorithm corresponding to the at least one feature point distribution density level, so as to obtain a first image after the image enhancement processing, so that over-enhancement of areas with good image quality can be prevented, and areas with different image quality are likely to have different image quality, so that image enhancement can be implemented in a targeted manner, and further image quality is improved.

102. And inputting the original image into a backbone network to obtain a fusion feature map.

The backbone network may be a Resnet network, a desnet network, a mobilet network, or other backbone networks. In a specific implementation, the electronic device may input the original image to the backbone network to obtain the fusion feature map.

In one possible example, the step 102 of inputting the original image into a backbone network to obtain a fused feature map may include the following steps:

21. Acquiring at least one shooting image adjacent to the shooting time of the original image;

22. inputting the original image and the at least one shooting image into the backbone network to obtain a first fusion characteristic;

23. determining at least one scale conversion image corresponding to the original image, wherein scales of different scale conversion images are different;

24. inputting the original image and the at least one scale image into the backbone network to obtain a second fusion characteristic;

25. and fusing the first fusion feature and the second fusion feature to obtain the fusion feature map.

The electronic device may acquire at least one captured image adjacent to the capturing moment of the original image, for example, take three continuous frames of images in a video, and further, may input the original image and the at least one captured image into a backbone network to obtain the first fusion feature. As shown in fig. 1B, an original image and two frames of images adjacent to the original image are acquired, and are input into a backbone network to obtain a plurality of target area feature images, each input image corresponds to one target area feature image, and a first fusion and feature image is formed by the plurality of target area feature images.

Further, the electronic device may further determine at least one scale transformation image corresponding to the original image, where scales between different scale transformation images are different, specifically, the at least one scale transformation image may be obtained by multi-scale decomposition or scale scaling or downsampling, the scales between different scale images are different, and then the original image and the at least one scale image may be input into a backbone network to obtain a second fusion feature, as shown in fig. 1C, for example, the electronic device may scale-transform the original image to obtain the at least one scale transformation image, and then input both the original image and the at least one scale transformation image into the backbone network, where the original image obtains a target feature frame, and the at least one scale transformation image may obtain features around the target frame, and then fuse these features to obtain the second fusion feature, that is, a gaussian fusion feature. In the traffic scene, different targets have certain relations, and meanwhile, the targets and the surrounding environment have certain relations. During training, a group of target frames can be formed by expanding a certain scale to the periphery according to the target bounding box, target features and expanded target frame region features are extracted, and the features are combined together in a Gaussian filtering mode.

Further, the electronic device may fuse the first fusion feature and the second fusion feature to obtain a fusion feature map.

In a possible example, the step 25 of fusing the first fusion feature and the second fusion feature to obtain the fusion feature map may include the following steps:

251. the first fusion image and the second fusion image are connected in series to obtain a series image;

252. and performing convolution operation on the serial images to obtain the fusion feature map.

In a specific implementation, the electronic device may connect the first fusion image and the second fusion image in series to obtain a series image, and further, may perform convolution operation on the series image to obtain a fusion feature map. Namely, the method is equivalent to connecting 2 groups of characteristic graphs in series, and reducing the channel number through one convolution layer to obtain a fusion characteristic graph.

103. And inputting the fusion feature map to a segmentation network to obtain a segmentation feature map.

The segmentation network is a network capable of realizing image segmentation, for example, the segmentation network may be a semantic segmentation network.

104. Inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape mark map to obtain a second shape feature map.

The shape network may be a network capable of implementing an edge contour extraction function, and in a specific implementation, the electronic device may input the fusion feature map into the shape network to obtain a first shape feature map, and optimize the first shape feature map through the shape marker map to obtain a second shape feature map, so that edge segmentation accuracy may be improved.

In a possible example, the step 104 of optimizing the first shape feature map through the shape mark map to obtain a second shape feature map may include the following steps:

41. determining a shape loss between the shape marker graph and the first shape feature graph;

42. optimizing model parameters of the shape network by the shape loss;

43. and calculating the first shape characteristic diagram through the optimized shape network to obtain the second shape characteristic diagram.

In this embodiment of the present application, the model parameter may be at least one of the following: weights, offsets, convolution kernels, number of layers, activation function type, metrics, weight optimization algorithm, batch_size, etc., without limitation. In a specific implementation, the electronic device may determine a shape loss between the shape marker graph and the first shape feature graph, that is, may implement a difference between the shape marker graph and the first shape feature graph, and may further optimize model parameters of the shape network through the shape loss, for example, may operate on the first shape feature graph through the optimized shape network to obtain the second shape feature graph. As shown in fig. 1D, the electronic device may perform an operation on its input data based on the shape network of the original model parameters, and optimize the operation result, that is, determine the shape loss between the operation result and the shape marker graph, input the shape loss to the loss function, and obtain updated parameters, and optimize the model parameters of the shape network by using the updated parameters.

105. And merging the segmentation feature map and the second shape feature map to obtain a merged feature network.

In a specific implementation, the electronic device may combine the segmentation feature map and the second shape feature map through a connection layer concat to obtain a combined feature network.

106. And carrying out first convolution operation on the combined characteristic network to obtain an intermediate operation result.

In a specific implementation, the electronic device may perform the first convolution operation on the combined feature network, so that the significance of the combined feature network may be improved, and an intermediate operation result may be obtained.

107. And decoding the intermediate operation result to obtain a first saliency feature map, and optimizing the first saliency feature map through a binary map corresponding to the first saliency feature map to obtain a second saliency feature map.

The electronic device may input the intermediate operation result to the decoding layer to perform decoding operation, obtain a first salient feature map, and perform optimization processing on the first salient feature map through a binary map corresponding to the first salient feature map, so as to obtain a second salient feature map. The electronic device may obtain a binary image by setting a target in the first saliency feature map to 255 and defining a background to 0, and perform optimization processing on the first saliency feature map by using the binary image and a loss function corresponding to the first saliency feature map to obtain a second saliency feature map.

In a possible example, the step 107 may include performing optimization processing on the first saliency map through a binary map corresponding to the first saliency map to obtain a second saliency map, where the second saliency map may include the following steps:

71. determining a target loss amount between the first saliency map and the binary map;

72. optimizing model parameters of the middle layer through the target loss;

73. and calculating the first saliency feature map through the optimized intermediate layer to obtain the second saliency feature map.

In this embodiment of the present application, the model parameter may be at least one of the following: weights, offsets, convolution kernels, number of layers, activation function type, metrics, weight optimization algorithm, batch_size, etc., without limitation. In a specific implementation, the electronic device determines a target loss amount between the first saliency feature map and the binary map, optimizes model parameters of the intermediate layer through the target loss amount, and calculates the first saliency feature map through the optimized intermediate layer to obtain a second saliency feature map, and the specific principle can refer to fig. 1D.

108. And determining a tape direction suggestion frame according to the second saliency characteristic map and the point category label map.

In a specific implementation, the electronic device may determine the tape direction suggestion box according to the second saliency feature map and the point class label map. The second saliency feature map corresponds to a prediction frame, the point class mark map corresponds to a target frame, as shown in fig. 1E, two branches of the middle layer point to saliency, one points to the frame, the other points to the branches of the frame, the last layer is a convolution layer, the output layer of the convolution layer is 4 channels, namely, 4 points are suggestion frames with directions.

In one possible example, the step 108 of determining the tape direction suggestion box according to the second saliency feature map may include the steps of:

81. determining the second saliency feature map corresponding to the position of a target, wherein the target is any target in the point category label map;

82. judging whether overlap exists between a rectangle i and a rectangle j, wherein the rectangle i is any frame corresponding to the target, and the rectangle j is any rectangle in the second saliency characteristic diagram;

83. when the rectangle i and the rectangle j are overlapped, determining the number of points in an overlapped area;

84. determining a ratio between the points and the total points of the second saliency feature map, and determining sample attributes of the overlapping region according to the ratio;

85. And training the overlapped area to obtain the band direction suggestion frame.

In a specific implementation, the sample attribute may be a positive sample or a negative sample, the targets are in one-to-one correspondence with the saliency feature images, and then the electronic device may determine a second saliency feature image corresponding to the positions of the targets, the targets are any targets in the point category label image, further, it is determined whether there is overlap between a rectangle i and a rectangle j, the rectangle i is any frame corresponding to the targets, the rectangle j is any rectangle in the second saliency feature image, when there is overlap between the rectangle i and the rectangle j, the number of points in the overlap area is determined, the ratio between the number of points and the total number of points of the second saliency feature image is determined, the sample attribute of the overlap area is determined according to the ratio, and training is performed on the overlap area to obtain the suggestion frame with direction.

109. And carrying out second convolution operation and ASPP operation on the band direction suggestion frame to obtain a score characteristic diagram.

Therein, an image space pyramid structure (Atrous Spatial Pyramid Pooling, ASPP) with a hole convolution. The electronic equipment can obtain a score characteristic diagram according to the second convolution operation and the ASPP operation according to the band direction suggestion frame, and the characteristic significance can be improved.

110. And optimizing the score characteristic graph according to the point category label graph to obtain a segmentation result image.

Wherein the segmentation result image may comprise at least one of: white lines, yellow lines, straight arrows, left-turn arrows, vehicles, pedestrians, or others. The electronic equipment can optimize the score characteristic graph according to the point category label graph to obtain a segmentation result image.

In a test embodiment of the present application, the loss function may be at least one of the following: hinge loss functions, cross entropy loss functions, exponential loss functions, and the like, are not limited herein.

In one possible example, the step 110 of optimizing the score feature map according to the point class label map may include the following steps:

1101. determining a point loss between the scoring feature map and the point class label map;

1102. optimizing model parameters of ASPP by the point loss;

1103. and calculating the score characteristic map through the optimized ASPP to obtain the image segmentation result.

In this embodiment of the present application, the model parameter may be at least one of the following: weights, offsets, convolution kernels, number of layers, activation function type, metrics, weight optimization algorithm, batch_size, etc., without limitation. In a specific implementation, the electronic device may determine a point loss between the score feature map and the point class label map, optimize model parameters of the ASPP through the point loss, and calculate the score feature map through the optimized ASPP to obtain an image segmentation result, and a specific principle may refer to fig. 1D.

Specifically, as shown in fig. 1E, the original image is input to the backbone network, the obtained feature images are respectively input to the semantic segmentation branching layer and the shape branching layer, and the shape branching layer supervises and guides the network to actively learn shape features. The shape branching layer outputs a shape characteristic diagram through a series of convolution layers, a batch normalization layer and an up-sampling layer, calculates the loss of the shape characteristic diagram and the target edge label diagram, and monitors and guides the network to actively learn the shape characteristic through optimizing the loss. Feature images obtained by the two branch networks are fused together through a concat layer, input into a decoding layer and are learned by the decoding layer to obtain a significance feature image. The binary image is generated by a target bounding box, and the process is guided to be studied by the network by adopting a supervised learning method. And calculating the loss of the binary image and the saliency characteristic image, and strengthening and guiding the network to learn the characteristics of the target point.

It can be seen that, the traffic scene segmentation method described in the embodiment of the application is applied to an electronic device, an original image is obtained, and a shape marker graph and a point class marker graph of the original image are obtained, the original image is an image including a target, the shape marker graph is an edge contour image of the target, the point class marker graph is an area image of the target, the original image is input into a backbone network to obtain a fusion feature graph, the fusion feature graph is input into a segmentation network to obtain a segmentation feature graph, the fusion feature graph is input into the shape network to obtain a first shape feature graph, the first shape feature graph is optimized through the shape marker graph to obtain a second shape feature graph, the segmentation feature graph and the second shape feature graph are combined, the merging feature graph is obtained, the merging feature network is subjected to a first convolution operation to obtain an intermediate operation result, the decoding operation is performed on the intermediate operation result to obtain a first saliency feature graph, the first saliency feature graph is obtained through a binary graph corresponding to the first saliency feature graph, a second saliency feature graph is determined to have a direction frame according to the second saliency feature graph, the second saliency feature graph is input into the shape network to obtain a first shape feature graph, the first shape feature graph is combined with a direction frame, the second shape feature graph is optimized according to the direction frame, on the second saliency feature graph is obtained, and the classification feature graph is better is obtained, on the basis, on the classification feature graph is better, on the basis of the classification feature graph is obtained, and the classification feature graph is better, on the classification feature graph can be obtained, and the classification feature graph is obtained, and the classification feature graph is fully can be combined.

In accordance with the embodiment shown in fig. 1A, please refer to fig. 2, fig. 2 is a schematic flow chart of a traffic scene segmentation method provided in the embodiment of the present application, which is applied to an electronic device, as shown in the figure, and the traffic scene segmentation method includes:

201. and acquiring an image to be processed.

202. And carrying out image segmentation on the image to be processed to obtain a target area image, and taking an image with the preset size including the target area image as an original image.

203. And acquiring a shape mark graph and a point category mark graph of the original image, wherein the shape mark graph is an edge contour image of the target, and the point category mark graph is a region image of the target.

204. And inputting the original image into a backbone network to obtain a fusion feature map.

205. And inputting the fusion feature map to a segmentation network to obtain a segmentation feature map.

206. Inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape mark map to obtain a second shape feature map.

207. And merging the segmentation feature map and the second shape feature map to obtain a merged feature network.

208. And carrying out first convolution operation on the combined characteristic network to obtain an intermediate operation result.

209. And decoding the intermediate operation result to obtain a first saliency feature map, and optimizing the first saliency feature map through a binary map corresponding to the first saliency feature map to obtain a second saliency feature map.

210. And determining a tape direction suggestion frame according to the second saliency feature map.

211. And carrying out second convolution operation and ASPP operation on the band direction suggestion frame to obtain a score characteristic diagram.

212. And optimizing the score characteristic graph according to the point category label graph to obtain a segmentation result image.

The preset size can be set by a user or default by the system.

The specific description of the steps 201 to 212 may refer to the corresponding steps of the traffic scene segmentation method described in fig. 1A, and are not repeated herein.

It can be seen that, according to the traffic scene segmentation method described in the embodiment of the application, on one hand, the shape marker graph and the point category marker graph of the original image can be fully utilized to guide network learning, and on the other hand, the feature significance can be improved, and the target accurate positioning can be realized by combining the suggested direction frame, so that the image segmentation precision can be improved.

In accordance with the above embodiment, referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device provided in the embodiment of the present application, as shown in the fig. 3, the electronic device includes a processor, a memory, a communication interface, and one or more programs applied to the electronic device, where the one or more programs are stored in the memory and configured to be executed by the processor, and in the embodiment of the present application, the programs include instructions for executing the following steps:

It can be seen that, in the electronic device described in the embodiment of the present application, an original image, a shape marker graph and a point class marker graph of the original image are obtained, the original image is an image including a target, the shape marker graph is an edge contour image of the target, the point class marker graph is an area image of the target, the original image is input into a backbone network to obtain a fusion feature graph, the fusion feature graph is input into a segmentation network to obtain a segmentation feature graph, the fusion feature graph is input into the shape network to obtain a first shape feature graph, the first shape feature graph is optimized through the shape marker graph to obtain a second shape feature graph, the segmentation feature graph and the second shape feature graph are combined to obtain a combined feature network, the combined feature network is subjected to a first convolution operation to obtain an intermediate operation result, the intermediate operation result is decoded to obtain a first saliency feature graph, the first saliency feature graph is optimized through a binary graph corresponding to the first saliency feature graph to obtain a second saliency feature graph, a direction suggestion frame is determined according to the second saliency feature graph, the second saliency feature graph is subjected to a second convolution operation and an ASPP score feature graph, and the classification point classification feature graph is fully oriented according to the classification result of the segmentation feature graph, on the other hand, and the accuracy of the classification feature graph can be fully improved.

In one possible example, in said determining a tape direction suggestion box from said second saliency feature map, the above-mentioned program comprises instructions for:

determining the second saliency feature map corresponding to the position of a target, wherein the target is any target in the point category label map;

judging whether overlap exists between a rectangle i and a rectangle j, wherein the rectangle i is any frame corresponding to the target, and the rectangle j is any rectangle in the second saliency characteristic diagram;

when the rectangle i and the rectangle j are overlapped, determining the number of points in an overlapped area;

determining a ratio between the points and the total points of the second saliency feature map, and determining sample attributes of the overlapping region according to the ratio;

and training the overlapped area to obtain the band direction suggestion frame.

In one possible example, in said inputting said original image into a backbone network resulting in a fused feature map, the above-mentioned program comprises instructions for performing the steps of:

acquiring at least one shooting image adjacent to the shooting time of the original image;

inputting the original image and the at least one shooting image into the backbone network to obtain a first fusion characteristic;

Determining at least one scale conversion image corresponding to the original image, wherein scales of different scale conversion images are different;

inputting the original image and the at least one scale transformation image into the backbone network to obtain a second fusion characteristic;

and fusing the first fusion feature and the second fusion feature to obtain the fusion feature map.

In one possible example, in the aspect of fusing the first fused feature and the second fused feature to obtain the fused feature map, the program includes instructions for performing the following steps:

the first fusion image and the second fusion image are connected in series to obtain a series image;

and performing convolution operation on the serial images to obtain the fusion feature map.

In one possible example, in said optimizing said first shape feature map by said shape marker map to obtain a second shape feature map, said program comprises instructions for performing the steps of:

determining a shape loss between the shape marker graph and the first shape feature graph;

optimizing model parameters of the shape network by the shape loss;

And calculating the first shape characteristic diagram through the optimized shape network to obtain the second shape characteristic diagram.

The foregoing description of the embodiments of the present application has been presented primarily in terms of a method-side implementation. It will be appreciated that the electronic device, in order to achieve the above-described functions, includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied as hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application may divide the functional units of the electronic device according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated in one processing unit. The integrated units may be implemented in hardware or in software functional units. It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice.

Fig. 4 is a functional unit block diagram of a traffic scene segmentation apparatus 400 according to an embodiment of the present application. The traffic scene segmentation apparatus 400 is applied to an electronic device, and comprises: an acquisition unit 401, an input unit 402, a merging unit 403, an operation unit 404, a decoding unit 405, a determination unit 406, and an optimization unit 407, wherein,

the acquiring unit 401 is configured to acquire an original image, and a shape marker graph and a point class marker graph of the original image, where the original image is an image including a target, the shape marker graph is an edge contour image of the target, and the point class marker graph is a region image of the target;

the input unit 402 is configured to input the original image into a backbone network to obtain a fusion feature map; inputting the fusion feature map to a segmentation network to obtain a segmentation feature map; inputting the fusion feature map into a shape network to obtain a first shape feature map, and optimizing the first shape feature map through the shape mark map to obtain a second shape feature map;

the merging unit 403 is configured to merge the segmentation feature map and the second shape feature map to obtain a merged feature network;

The operation unit 404 is configured to perform a first convolution operation on the combined feature network to obtain an intermediate operation result;

the decoding unit 405 is configured to perform a decoding operation on the intermediate operation result to obtain a first salient feature map, and perform an optimization process on the first salient feature map through a binary map corresponding to the first salient feature map to obtain a second salient feature map;

the determining unit 406 is configured to determine a band direction suggestion box according to the second saliency feature map;

the operation unit 404 is further specifically configured to perform a second convolution operation and ASPP operation on the frame with direction suggestion to obtain a score feature map;

the optimizing unit 407 is configured to optimize the score feature map according to the point class label map, so as to obtain a segmentation result image.

It can be seen that, the traffic scene segmentation device described in the embodiment of the application is applied to an electronic device, an original image is obtained, and a shape marker graph and a point class marker graph of the original image are obtained, the original image is an image including a target, the shape marker graph is an edge contour image of the target, the point class marker graph is an area image of the target, the original image is input into a backbone network to obtain a fusion feature graph, the fusion feature graph is input into a segmentation network to obtain a segmentation feature graph, the fusion feature graph is input into the shape network to obtain a first shape feature graph, the first shape feature graph is optimized through the shape marker graph to obtain a second shape feature graph, the segmentation feature graph and the second shape feature graph are combined, the merging feature graph is obtained, the merging feature network is subjected to a first convolution operation to obtain an intermediate operation result, the decoding operation is performed on the intermediate operation result to obtain a first saliency feature graph, the first saliency feature graph is obtained through a binary graph corresponding to the first saliency feature graph, a second saliency feature graph is determined to have a direction frame according to the second saliency feature graph, the second saliency feature graph is input into the shape network to obtain a first shape feature graph, the first shape feature graph is combined with a direction frame, the second shape feature graph is optimized according to the direction frame, on the basis, on the other hand, the classification feature graph can be fully combined with the shape marker graph is obtained, and the classification feature graph can be fully has a high, and the classification feature graph can be obtained.

In one possible example, in the determining a tape direction suggestion box according to the second saliency feature map, the determining unit 406 is specifically configured to:

and training the overlapped area to obtain the band direction suggestion frame.

In one possible example, in the aspect of inputting the original image into a backbone network to obtain a fused feature map, the input unit 402 is specifically configured to:

In one possible example, in the aspect of fusing the first fusion feature and the second fusion feature to obtain the fusion feature map, the merging unit 403 is specifically configured to:

In one possible example, in the optimizing the first shape feature map by the shape marker map to obtain a second shape feature map, the input unit 402 is specifically configured to:

optimizing model parameters of the shape network by the shape loss;

It may be understood that the functions of each program module of the traffic scene segmentation apparatus of the present embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the relevant description of the foregoing method embodiment, which is not repeated herein.

The embodiment of the application also provides a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, where the computer program causes a computer to execute part or all of the steps of any one of the methods described in the embodiments of the method, where the computer includes an electronic device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the methods described in the method embodiments above. The computer program product may be a software installation package, said computer comprising an electronic device.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

The foregoing has outlined rather broadly the more detailed description of embodiments of the present application, wherein specific examples are provided herein to illustrate the principles and embodiments of the present application, the above examples being provided solely to assist in the understanding of the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A traffic scene segmentation method, characterized by being applied to an electronic device, the method comprising:

inputting the fusion feature map into a shape network to obtain a first shape feature map, optimizing the first shape feature map through the shape mark map to obtain a second shape feature map, wherein the shape network comprises a network capable of realizing an edge contour extraction function;

optimizing the score feature map according to the point category label map to obtain a segmentation result image;

The optimizing the first saliency feature map through the binary map corresponding to the first saliency feature map to obtain a second saliency feature map includes:

determining a target loss amount between the first saliency map and the binary map;

optimizing model parameters of the middle layer through the target loss;

calculating the first saliency feature map through the optimized intermediate layer to obtain the second saliency feature map;

wherein the determining a tape direction suggestion box according to the second saliency feature map includes:

And training the overlapped area to obtain the band direction suggestion frame.

2. The method of claim 1, wherein inputting the original image into a backbone network results in a fused feature map, comprising:

3. The method according to claim 2, wherein fusing the first fused feature and the second fused feature to obtain the fused feature map includes:

the first fusion feature and the second fusion feature are connected in series to obtain a series image;

4. A method according to any one of claims 1-3, wherein said optimizing said first shape profile by said shape marker profile to obtain a second shape profile comprises:

optimizing model parameters of the shape network by the shape loss;

5. A traffic scene segmentation apparatus, characterized in that it is applied to an electronic device, the apparatus comprising: an acquisition unit, an input unit, a merging unit, an operation unit, a decoding unit, a determination unit and an optimization unit, wherein,

the input unit is used for inputting the original image into a backbone network to obtain a fusion feature map; inputting the fusion feature map to a segmentation network to obtain a segmentation feature map; inputting the fusion feature map into a shape network to obtain a first shape feature map, optimizing the first shape feature map through the shape mark map to obtain a second shape feature map, wherein the shape network comprises a network capable of realizing an edge contour extraction function;

the optimizing unit is used for optimizing the score characteristic graph according to the point category label graph to obtain a segmentation result image;

Optimizing model parameters of the middle layer through the target loss;

wherein, in the aspect of determining a tape direction suggestion box according to the second saliency feature map, the determining unit is specifically configured to:

and training the overlapped area to obtain the band direction suggestion frame.

6. The apparatus according to claim 5, wherein, in the aspect of inputting the original image into a backbone network to obtain a fusion feature map, the input unit is specifically configured to:

7. An electronic device comprising a processor, a memory for storing one or more programs and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-4.

8. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any of claims 1-4.