CN114782459B

CN114782459B - Spliced image segmentation method, device and equipment based on semantic segmentation

Info

Publication number: CN114782459B
Application number: CN202210701199.5A
Authority: CN
Inventors: 张翡; 高依铨; 邓富城
Original assignee: Shandong Jivisual Angle Technology Co ltd
Current assignee: Shandong Jijian Technology Co ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-08-30
Anticipated expiration: 2042-06-21
Also published as: CN114782459A

Abstract

The application discloses a spliced image segmentation method, a spliced image segmentation device and spliced image segmentation equipment based on semantic segmentation, relates to the technical field of image data processing, and is used for improving the robustness and reliability of spliced image segmentation. The spliced image segmentation method comprises the following steps: inputting an obtained spliced image into a pre-trained semantic segmentation model, wherein the spliced image comprises a single image, determining segmentation labels of the spliced image through the semantic segmentation model, extracting outline information of the segmentation labels, outputting outline points of the segmentation labels according to the outline information, wherein the outlines of the segmentation labels are composed of a plurality of outline points, calculating a minimum circumscribed rectangular frame of the outline points of the segmentation labels, obtaining a single image target frame, determining position information and width and height information of the single image target frame, inputting the position information and the width and height information into the pre-trained prediction model, determining a splicing mode of the spliced image through the prediction model, and segmenting the spliced image according to the splicing mode.

Description

Spliced image segmentation method, device and equipment based on semantic segmentation

Technical Field

The present application relates to the field of image data processing technologies, and in particular, to a method, an apparatus, and a device for segmenting a stitched image based on semantic segmentation.

Background

In the field of video monitoring of the traffic industry, a plurality of images need to be taken to judge whether the vehicle is in violation of regulations, and the taken images can be spliced to form a large picture, so that the manual auditing and judging are facilitated. However, with the continuous development of artificial intelligence, the artificial intelligence technology gradually replaces the manual work of auditing and judging, and in the traffic violation auditing algorithm, the first step is to segment the spliced composite picture into single pictures.

In the prior art, the segmentation of the stitched image generally adopts a traditional image segmentation method, and the traditional image segmentation method is biased to segment the stitched image through the change of pixel points in the image. For example, the edge information of the single-image in the stitched image is determined by detecting the pixel points with severe light and shade change in the stitched image, that is, the pixel points with large gradient change, and then the stitched image is segmented according to the edge information.

When the image shooting equipment shoots an image, due to the influence of environmental factors around a road surface, the shot image may have uneven illuminance or noise, so that the quality of the spliced image is affected, and therefore, when the spliced images are segmented by using a traditional method, the condition of inaccurate detection of edge information occurs, the segmentation result of the final spliced image may be wrong, and the robustness is low.

Disclosure of Invention

In order to solve the technical problem, the application provides a method, a device and equipment for segmenting a spliced image based on semantic segmentation, which are used for improving the robustness and reliability of segmentation of the spliced image.

The first aspect of the application provides a spliced image segmentation method based on semantic segmentation, which comprises the following steps:

inputting an obtained spliced image into a pre-trained semantic segmentation model, wherein the spliced image comprises a single image;

determining segmentation labels of the spliced image through the semantic segmentation model;

extracting contour information of the segmentation labels;

outputting contour points of the segmentation labels according to the contour information, wherein the contours of the segmentation labels are composed of a plurality of contour points;

calculating a minimum circumscribed rectangle frame of the contour points of the segmentation labels, and obtaining a single-image target frame, wherein the single-image target frame is a frame of an area occupied by each single image in the spliced image;

determining position information and width and height information of the single-image target frame;

inputting the position information and the width and height information into a pre-trained prediction model;

determining a splicing mode of the spliced images through the prediction model;

and segmenting the spliced image according to the splicing mode.

Optionally, the semantic segmentation model is obtained by:

building an initial semantic segmentation model;

acquiring a first sample splicing map, wherein the first sample splicing map contains splicing image label information;

inputting the first sample mosaic into the initial semantic segmentation model;

extracting features in the first sample splicing image to obtain a first image feature image;

performing feature fusion on the first image feature map, and outputting a fused second image feature map;

processing the second image feature map to obtain a sample segmentation label;

performing first loss value calculation on the sample segmentation label to generate first loss value variation data, wherein the first loss value variation data is a first loss value data collection counted during each training of the initial semantic segmentation model;

and when the first loss value change data reach a preset condition, obtaining the semantic segmentation model.

Optionally, the initial semantic segmentation model includes a lightweight network, a feature pyramid network, a segmentation head, and a classification algorithm;

the lightweight network is used as an encoder to extract image features of the spliced image;

the characteristic pyramid network is used for extracting and fusing characteristics with different spatial resolutions in the image characteristics extracted by the lightweight network so as to extract more image characteristic information;

the segmentation head is used for determining final features from a plurality of image feature information extracted from a feature pyramid network, the classification algorithm is used for determining classification categories of the spliced image, the segmentation head comprises a convolution layer, an upper sampling layer and an activation function layer sigmoid, and the classification algorithm comprises a pooling layer, a full connection layer and an activation function layer sigmoid.

Optionally, the performing feature fusion on the first image feature map, and outputting a fused second image feature map includes:

the feature pyramid network is used as a decoder and is used for fusing the first image feature map output by the lightweight network to output a second image feature map;

the processing the second image feature map to obtain a sample segmentation label includes:

and inputting the second image feature map into the convolutional layer, the upsampling layer and the activation function layer sigmoid to obtain the sample segmentation label, wherein the sample segmentation label is an area occupied by a single map in the second image feature map.

Optionally, the prediction model is obtained by:

building an initial prediction model;

acquiring a second sample mosaic, wherein the second sample mosaic comprises position information and width and height information of each single picture and a mosaic mode of the second sample mosaic;

inputting the second sample mosaic into the initial prediction model;

performing second loss value calculation on the second sample splicing graph according to a preset loss function to generate second loss value change data, wherein the second loss value change data is a loss value data collection counted during each time of training the initial prediction model;

and when the second loss value change data reaches convergence, obtaining the prediction model.

Optionally, when the prediction model is trained, the single-pattern position information is used as training data, and the stitching pattern is used as a label for training.

Optionally, the extracting the contour information of the segmentation label includes:

carrying out binarization processing on the segmentation label to obtain a binarization segmentation label;

and carrying out contour extraction on the binary segmentation label by a hollowed internal point method to obtain the contour information.

The second aspect of the present application provides a mosaic image segmentation apparatus based on semantic segmentation, including:

the first input unit is used for inputting the obtained spliced image into a pre-trained semantic segmentation model, wherein the spliced image comprises a single image;

the first determining unit is used for determining the segmentation labels of the spliced image through the semantic segmentation model;

an extraction unit configured to extract contour information of the division label;

the output unit is used for outputting the contour points of the segmentation labels according to the contour information, and the contours of the segmentation labels are composed of a plurality of contour points;

the first calculation unit is used for calculating a minimum circumscribed rectangular frame of the contour points of the segmentation labels and obtaining a single-image target frame, wherein the single-image target frame is a frame of an area occupied by each single image in the spliced image;

the second determining unit is used for determining the position information and the width and height information of the single-image target frame;

the second input unit is used for inputting the position information and the width and height information into a pre-trained prediction model;

a third determining unit, configured to determine a stitching mode of the stitched image through the prediction model;

and the segmentation unit is used for segmenting the spliced image according to the splicing mode.

The third aspect of the present application provides a spliced image segmentation apparatus based on semantic segmentation, including:

the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;

the memory is a transient memory or a persistent memory;

the central processor is configured to communicate with the memory and execute the instructions in the memory to perform any of the aspects of the first aspect and alternatives thereof.

A fourth aspect of the present application provides a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to carry out the method of the first aspect and any one of the alternatives of the first aspect.

According to the technical scheme, the embodiment of the application has the following advantages:

according to the method, the segmentation labels of the input spliced images are determined through a semantic segmentation model, the outline information of the segmentation labels is extracted after the segmentation labels are determined, the minimum circumscribed rectangle frame of the segmentation labels is determined, the single-image target frame is obtained, the pre-trained prediction model is input after the position information and the width and height information of the single-image target frame are determined, the splicing mode of the spliced images can be determined through the prediction model, and then the spliced images are segmented according to the splicing mode. The semantic segmentation is to understand an image from a pixel level, and can classify pixels belonging to the same class into one class, so that the problem of inaccurate segmentation caused by uneven illumination or noise in a spliced image when the spliced image is segmented is avoided, and the robustness and reliability of the segmentation of the spliced image are improved.

Drawings

In order to more clearly illustrate the technical solutions in the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of an embodiment of a segmentation method for a stitched image based on semantic segmentation in the present application;

2-1 and 2-2 are diagrams illustrating another embodiment of a segmentation method for a spliced image based on semantic segmentation in the present application;

FIG. 3 is a schematic diagram of a semantic segmentation network component structure according to the present application;

FIG. 4 is a schematic diagram of a segmentation apparatus for semantic segmentation based stitched images according to the present application;

FIG. 5 is another schematic diagram of a segmentation apparatus for semantic segmentation based stitched images according to the present application;

fig. 6 is a schematic structural diagram of a stitched image segmentation apparatus based on semantic segmentation in the present application.

Detailed Description

The embodiment of the application provides a spliced image segmentation method, a spliced image segmentation device and spliced image segmentation equipment based on semantic segmentation, which are used for improving the robustness and reliability of spliced image segmentation.

In a traffic violation auditing algorithm, whether a vehicle has a specific illegal behavior needs to be judged through the auditing algorithm, but a group of composite pictures formed by splicing single pictures are usually transmitted into the auditing algorithm, so that the composite pictures need to be divided into the single pictures in the first step of the auditing algorithm. When segmenting a stitched image, conventional segmentation methods are generally used to segment the stitched image, for example: the edge information of the single image is detected by the traditional image processing technology, and the spliced image is segmented after the edge information of the single image is determined by the change of pixel points in the image, but the shot image may have uneven illuminance or noise, so that the detected position information of the single image is deviated or inaccurate, and the segmentation result of the spliced image is wrong. According to the method, the spliced image is segmented through the trained semantic segmentation model, the spliced image can be rapidly and accurately segmented, and the method has high robustness in traffic violation scenes.

The following briefly describes a segmentation method of a stitched image based on semantic segmentation in the present application:

referring to fig. 1, fig. 1 is an embodiment of a method for segmenting a stitched image based on semantic segmentation in the present application, where the method may be implemented on a device such as a mobile phone or a tablet, and for convenience of description, the following is specifically described by applying the method to a server, and includes:

101. the server inputs the obtained spliced image into a pre-trained semantic segmentation model, wherein the spliced image comprises a single image

In this embodiment, the server obtains a stitched image, the stitched image is formed by stitching a plurality of shot single images, and the stitched image may be in a plurality of stitching modes, for example, up 1 down 2, left 1 right 1, up 2 down 2, and the like, and is not limited herein. And after the spliced image is obtained, inputting the spliced image into a semantic segmentation model which is trained and trained in advance. Alternatively, before the stitched image is input into the pre-trained semantic segmentation model, the stitched image may be pre-processed, for example, pixel brightness conversion, gray level change, and the like, and then the pre-processed stitched image may be input into the pre-trained semantic segmentation model.

102. Server determines segmentation labels of spliced images through semantic segmentation model

In the image field, semantics refers to the content of an image, understanding the meaning of a picture, dividing means that different objects in the picture are divided from the perspective of pixels, and labeling each pixel in the picture, for example, in one image, a person and a background exist, all the pixels of the person can be labeled as one type through semantic division, and the pixels of the background can be labeled as one type. In the stitched image, the single images can be mainly labeled as one type, and the stitched background can be labeled as one type. A single image region, namely a region needing to be segmented in a spliced image, can be obtained through the semantic segmentation model, and the region is a segmentation label. Since the stitched image is formed by stitching a plurality of single images, for example, one stitched image is stitched by 3 single images, there are 3 regions to be segmented, and these 3 regions are all segmentation labels of the stitched image.

103. Server extracts contour information of segmentation label

In this embodiment, the contour is a curve formed by a series of connected points, and can represent a basic shape of an object, and can be used for graph analysis.

104. The server outputs contour points of the segmentation labels according to the contour information, and the contours of the segmentation labels are composed of a plurality of contour points

In the present embodiment, the contour is composed of a plurality of contour points, or may be regarded as a point set of a plurality of contour points, and after the contour of the division label is determined, the server determines each contour point of the division label and determines the coordinate information of the contour point.

105. The server calculates the minimum circumscribed rectangle frame of the contour points of the segmentation labels and obtains a single-image target frame, wherein the single-image target frame is the frame of the area occupied by each single image in the spliced image

Because the spliced image is formed by splicing a plurality of single images, a plurality of areas needing to be segmented exist, the areas are not connected, namely the segmentation labels are not connected, the contour points of each area are used as a group, and the contour points of each group are connected. Since the coordinate information of each contour point is already determined, the minimum bounding rectangle box of each set of contour points can be calculated by the minimum bounding rectangle algorithm. The minimum circumscribed rectangle refers to a plurality of two-dimensional shapes represented by two-dimensional coordinates, such as points, straight lines and the maximum range of a polygon, namely, a rectangle whose boundary is determined by the maximum abscissa, the minimum abscissa, the maximum ordinate and the minimum ordinate of each vertex of a given two-dimensional shape, the minimum external rectangular frame of each group of contour points is calculated, the obtained minimum external rectangular frame is determined as a single-picture target frame, and the single-picture target frame is a frame of an area occupied by each single picture in a spliced image. Optionally, the minimum bounding rectangle that does not satisfy the preset rule may also be filtered, and since other articles in the background may also be erroneously determined as the segmentation tag when determining the segmentation tag, filtering needs to be performed by the preset rule, where the preset rule may be a preset scale, a preset position, and specifically, no determination is made here.

106. The server determines the position information and the width and height information of the single-image target frame

In this embodiment, the server determines a central point of the minimum circumscribed rectangle frame, determines position information of the single-image target frame according to the central point, and determines width and height information of the single-image target frame according to the width and height of the minimum circumscribed rectangle frame.

107. The server inputs the position information and the width and height information into a pre-trained prediction model

In this embodiment, the server inputs the position information and the width and height information of the single image target frame into a pre-trained prediction model, the prediction model is a stitching mode prediction model, and a stitching mode of a stitched image can be output after the position information and the width and height information of the single image in the stitched image are obtained.

108. The server determines the splicing mode of the spliced images through the prediction model

In this embodiment, the prediction model finally determines a stitching mode of the stitched image, where the stitching mode includes a single image, top 1 and bottom 2, top 2 and bottom 1, left 1 and right 1, top 2 and bottom 2, top 1 and bottom 3, top 3 and bottom 1, and the like.

109. The server segments the spliced image according to the splicing mode

In this embodiment, after the splicing pattern is obtained, the spliced image can be segmented according to the segmentation pattern corresponding to the splicing pattern, and the obtained simple graph is input into a traffic violation auditing algorithm to determine whether the vehicle has a violation.

In the embodiment, the server inputs the acquired spliced image into a trained semantic segmentation model to obtain a segmentation label, determines a minimum circumscribed rectangular frame of the segmentation label to obtain a single-image target frame, inputs the single-image target frame into the trained prediction model after determining the position and width and height information of the single-image target frame, determines a splicing mode of the spliced image through the prediction model, and segments the spliced image according to the splicing mode, so that the spliced image can be segmented quickly and accurately, and the segmented image has high robustness in traffic violation scenes.

In this embodiment, an initial semantic segmentation model and an initial prediction model need to be built first, and the semantic segmentation model and the prediction model can be obtained after training, which will be described in detail below with reference to the accompanying drawings.

Referring to fig. 2-1, fig. 2-2 and fig. 3, fig. 2-1 and fig. 2-2 are another embodiment of the method for segmenting a stitched image based on semantic segmentation in the present application, and fig. 3 is a schematic structural diagram of an initial semantic segmentation model, and for convenience of description, the method is specifically described below as applied to a server, and includes:

201. server building initial semantic segmentation model

202. The server acquires a first sample splicing map, wherein the first sample splicing map contains label information of splicing images

203. The server inputs the first sample splicing picture into an initial semantic segmentation model

204. The server extracts the features in the first sample splicing picture to obtain a first image feature picture

205. The server performs feature fusion on the first image feature map and outputs a fused second image feature map

206. The server processes the second image feature map to obtain a sample segmentation label

207. The server calculates a first loss value of the sample segmentation label to generate first loss value change data, wherein the first loss value change data is a first loss value data collection set counted during each training of the initial semantic segmentation model

208. When the first loss value change data reach a preset condition, the server obtains a semantic segmentation model

In this embodiment, an initial semantic segmentation model needs to be built, and as shown in fig. 3, the initial semantic segmentation model includes a lightweight network (mobrienet _ v 2) 301, a Feature Pyramid network (Feature pyramids) 302, a segmentation head (segmentionhead) 303, and a classification algorithm (ClassificationHead) 304, where the lightweight network is used as an encoder to extract image features of a stitched image; the characteristic pyramid network is used for extracting and fusing characteristics with different spatial resolutions in the image characteristics extracted by the lightweight network so as to extract more image characteristic information; the segmentation head is used for determining final features from a plurality of image feature information extracted from the feature pyramid network, the classification algorithm is used for determining classification categories and confidence degrees of the spliced images, the segmentation head comprises a convolution layer, an up-sampling layer and an activation function layer sigmoid, and the classification algorithm comprises a pooling layer, a full-link layer and an activation function layer sigmoid.

When an initial semantic segmentation model is trained, a large number of samples need to be obtained for training, for example, fifty thousand first sample mosaic images are obtained, each first sample mosaic image contains label information, when training is performed, an original image and label information of the first sample mosaic image are input, the label information is a binary image, and only two values, namely 0 and 1, are obtained. The pixel value of the position belonging to the single image area is 1, the other positions are background, and the pixel value is 0. The lightweight network 301 is mainly used as an encoder for extracting image features of a first sample mosaic, after an image is input, the lightweight network 301 outputs first image feature maps of different scales through different convolution structures, the obtained first image feature maps of different scales are input into the feature pyramid network 302, the feature pyramid network 302 is used as a decoder for extracting features of images of each scale, feature maps of different scales output by the encoder of the lightweight network 301 are fused to obtain decoder output, and a fused second image feature map is output. Then, the segmentation head 303 performs convolution, upsampling and activating a function layer sigmoid on the feature map output by the decoder to obtain a segmentation label consistent with the scale of the input sample image. Since the number of classes to be divided is only one class of the single graph, one division label is output. And the Loss function in the training of the initial semantic mode adopts Dice Loss. The Dice coefficient is a set similarity measure function, and is generally used for calculating the similarity of two samples. And calculating the Dice by using a loss function calculation formula. If the Dice coefficient is larger, the set is more similar, the loss is smaller, and vice versa, and when the Dice coefficient reaches a preset value, the training of the semantic segmentation model is completed.

Optionally, during training of the semantic segmentation model, images of multiple scales may be randomly input for training, such as images with resolutions 416x416 and 512x512, where the specific scale is not limited here, so as to improve robustness of the semantic segmentation model for image scale transformation, and make the model less susceptible to image scale transformation.

209. The server inputs the obtained spliced image into a pre-trained semantic segmentation model, wherein the spliced image comprises a single image

210. The server determines the segmentation labels of the spliced images through the semantic segmentation model

Steps 209 to 210 in this embodiment are similar to steps 101 to 102 in the embodiment shown in fig. 1, and are not repeated here.

211. The server carries out binarization processing on the segmentation label to obtain a binarization segmentation label

212. The server extracts the contour of the binary segmentation label by a hollowed internal point method to obtain contour information

In this embodiment, the hollowed-out interior point method is generally used for extracting a binary image contour, and for a non-binary image, binarization processing needs to be performed first, so that a segmentation label needs to be binarized to obtain a binarized segmentation label. The image binarization is a process of setting the gray value of a pixel point on an image to be 0 or 255, namely, the whole image presents an obvious black and white effect. The method for hollowing out the internal points needs to traverse each pixel point in the image, if the gray value of the point is 0, the gray value of the point is assigned to be 0 no matter what the gray values of the 8 surrounding adjacent pixel points are; if the gray value of the point is 255 and the gray values of 8 adjacent pixels around the point are 255, assigning the gray value of the point to be 0; in addition to the above two cases, the point gray value is assigned to 255. After the above processing operations, the contour of the image can be obtained. Alternatively, the contour of the segmentation label can be determined by a boundary tracking method, and first, the non-binary image also needs to be subjected to binarization processing, a starting point is determined, and then, a next boundary point is searched according to a preset rule until the searching point coincides with the starting point, that is, the contour is found. Further, there are a region growing method, a region splitting and merging method, and the like, and a specific method of extracting a contour is not limited here.

213. The server outputs contour points of the segmentation labels according to the contour information, and the contours of the segmentation labels are composed of a plurality of contour points

214. The server calculates the minimum circumscribed rectangle frame of the contour points of the segmentation labels and obtains a single-image target frame, wherein the single-image target frame is the frame of the area occupied by each single image in the spliced image

215. The server determines the position information and the width and height information of the single-image target frame

Steps 213 to 215 in this embodiment are similar to steps 104 to 106 in the embodiment shown in fig. 1, and are not repeated herein.

216. Server building initial prediction model

217. The server acquires a second sample mosaic which contains the position information and the width and height information of each single picture and the mosaic mode of the second sample mosaic

218. The server inputs the second sample splicing map into the initial prediction model

219. The server performs second loss value calculation on the second sample splicing diagram according to a preset loss function to generate second loss value change data, wherein the second loss value change data is a loss value data collection set counted during each time of training the initial prediction model

220. When the second loss value change data reaches convergence, the server obtains a prediction model

In this embodiment, a large number of samples are required to train the initial prediction model, for example, sixty thousand second sample mosaic images may be obtained, each second sample mosaic image includes position information of each single image, that is, coordinate information of a vertex angle at the top left corner of a single image target frame, and width and height information of the single image target frame, the coordinate information and the width and height information are input by the model during training, the second sample mosaic image further includes a mosaic pattern of the second sample mosaic image, when the prediction model is trained, the position information of the single image is used as training data, the mosaic pattern is used as a label for training, and a regression model is used for fitting:

loss function (least squares): the smaller the loss, the closer h (x) is to y (x), i.e., the closer the fit value is to the true value.

The regression coefficients (weights θ) are unknown and need to be adjusted continuously to minimize the loss values. And during training, adjusting the weight theta by adopting a gradient descent method until convergence to obtain a prediction model.

221. The server inputs the position information and the width and height information into a pre-trained prediction model

222. The server determines the splicing mode of the spliced images through the prediction model

223. The server segments the spliced image according to the splicing mode

Steps 221 to 223 in this embodiment are similar to steps 107 to 109 in the embodiment shown in fig. 1, and are not described again here.

In the embodiment, the server trains the initial semantic segmentation model to obtain a semantic segmentation model, the server inputs the obtained spliced image into the trained semantic segmentation model to obtain a segmentation label, the contour information of the segmentation label is determined by a hollowed-out interior point method, the minimum circumscribed rectangle frame of the segmentation label is determined according to the contour information to obtain a single-image target frame, the server determines the position and the width and height information of the single-image target frame and then inputs the position and the width and height information into the trained prediction model, the splicing mode of the spliced image is determined through the prediction model, and then the spliced image is segmented according to the splicing mode.

The above description is about the segmentation method of the stitched image based on semantic segmentation, and the following description is about the segmentation device of the stitched image based on semantic segmentation:

referring to fig. 4, a segmentation apparatus for stitched images based on semantic segmentation in the present application includes:

a first input unit 401, configured to input an obtained stitched image into a pre-trained semantic segmentation model, where the stitched image includes a single image;

a first determining unit 402, configured to determine a segmentation label of the stitched image through a semantic segmentation model;

an extracting unit 403 for extracting contour information of the division label;

an output unit 404, configured to output contour points of a segmentation label according to the contour information, where the contour of the segmentation label is composed of a plurality of contour points;

the first calculation unit 405 is configured to calculate a minimum circumscribed rectangular frame of the contour points of the segmentation labels, and obtain a single-map target frame, where the single-map target frame is a frame of an area occupied by each single map in the stitched image;

a second determining unit 406, configured to determine position information and width and height information of the single-image target frame;

a second input unit 407, configured to input the position information and the width and height information into a pre-trained prediction model;

a third determining unit 408, configured to determine a stitching mode of the stitched image through the prediction model;

and a segmentation unit 409 for segmenting the stitched image according to the stitching mode.

In this embodiment, a first input unit 401 inputs an acquired stitched image into a trained semantic segmentation model, a first determination unit 402 determines a segmentation label, a first calculation unit 405 calculates a minimum circumscribed rectangle frame of the segmentation label to obtain a single-image target frame, a second determination unit 406 determines a position and width and height information of the single-image target frame, a second input unit 407 inputs the position and width and height information of the single-image target frame into the trained prediction model, a third determination unit 408 determines a stitching mode of the stitched image through the prediction model, and a segmentation unit 409 segments the stitched image according to the stitching mode, so that the stitched image can be segmented quickly and accurately, and has high robustness in a traffic violation scene.

Referring to fig. 5, another segmentation apparatus for a merged image based on semantic segmentation in the present application includes:

the first building unit 501 is used for building an initial semantic segmentation model;

a first obtaining unit 502, configured to obtain a first sample mosaic, where the first sample mosaic includes mosaic image label information;

a third input unit 503, configured to input the first sample mosaic into the initial semantic segmentation model;

a second extracting unit 504, configured to extract features in the first sample mosaic to obtain a first image feature map;

a fusion unit 505, configured to perform feature fusion on the first image feature map, and output a fused second image feature map;

the first processing unit 506 is configured to process the second image feature map to obtain a sample segmentation label;

a second calculating unit 507, configured to perform a first loss value calculation on the sample segmentation label to generate first loss value change data, where the first loss value change data is a first loss value data set counted during each training of the initial semantic segmentation model;

a first obtaining unit 508, configured to obtain a semantic segmentation model when the first loss value change data reaches a preset condition;

a first input unit 509, configured to input the obtained stitched image into a pre-trained semantic segmentation model, where the stitched image includes a single image;

a first determining unit 510, configured to determine a segmentation label of the stitched image through a semantic segmentation model;

the extraction unit 511 includes:

a second processing unit 5111, configured to perform binarization processing on the segmentation label to obtain a binarized segmentation label;

an extraction module 5112, configured to perform contour extraction on the binarized segmented label by using a hollowed-out interior point method to obtain contour information;

an output unit 512, configured to output contour points of the segmentation labels according to the contour information, where the contours of the segmentation labels are composed of a plurality of contour points;

a first calculating unit 513, configured to calculate a minimum circumscribed rectangular frame of the contour points of the segmentation labels, and obtain a single-map target frame, where the single-map target frame is a frame of an area occupied by each single map in the stitched image;

a second determining unit 514, configured to determine position information and width and height information of the single-map target frame;

a second building unit 515, configured to build an initial prediction model;

a second obtaining unit 516, configured to obtain a second sample mosaic, where the second sample mosaic includes position information and width and height information of each single map, and a mosaic pattern of the second sample mosaic;

a fourth input unit 517 for inputting the second sample mosaic into the initial prediction model;

a third calculating unit 518, configured to perform second loss value calculation on the second sample mosaic according to a preset loss function to generate second loss value change data, where the second loss value change data is a loss value data set counted during each training of the initial prediction model;

a second obtaining unit 519, configured to obtain a prediction model when the second loss value change data reaches convergence;

a second input unit 520, configured to input the position information and the width and height information into a pre-trained prediction model;

a third determining unit 521, configured to determine a stitching mode of the stitched image through the prediction model;

and a segmentation unit 522, configured to segment the stitched image according to the stitching mode.

In this embodiment, an initial semantic segmentation model is trained, a first obtaining unit 508 obtains a semantic segmentation model, a first input unit 509 inputs an obtained spliced image into the trained semantic segmentation model, a first determining unit 510 obtains segmentation labels, an extracting module 5112 determines contour information of the segmentation labels by a hollowed-out interior point method, a first calculating unit 513 determines a minimum circumscribed rectangular frame of the segmentation labels according to the contour information to obtain a single-image target frame, a second determining unit 514 determines positions and width and height information of the single-image target frame, a second input unit 520 inputs the positions and the width and height information of the single-image target frame into the trained prediction model, a third determining unit 521 determines a splicing mode of the spliced image through the prediction model, and a segmentation unit 522 segments the spliced image according to the splicing mode, so that the spliced image can be segmented rapidly and accurately, and has higher robustness in traffic violation scenes.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a stitched image segmentation apparatus based on semantic segmentation in the present application, including:

a central processing unit 602, a memory 601, an input/output interface 603, a wired or wireless network interface 604 and a power supply 605;

the memory 601 is a transient storage memory or a persistent storage memory;

the central processor 602 is configured to communicate with the memory 601 and execute the operation of instructions in the memory 601 to perform the steps in any of the embodiments shown in fig. 1-2-1 and 2-2.

The application provides a computer-readable storage medium, which comprises instructions that, when executed on a computer, cause the computer to execute a method corresponding to any one of the embodiments of fig. 1-2-1 and 2-2.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Claims

1. A spliced image segmentation method based on semantic segmentation is characterized by comprising the following steps:

extracting contour information of the segmentation labels;

calculating the minimum circumscribed rectangle frame of the contour points of the segmentation labels, and obtaining a single-image target frame, wherein the single-image target frame is a frame of an area occupied by each single image in the spliced image;

determining a splicing mode of the spliced images through the prediction model;

and segmenting the spliced image according to the splicing mode.

2. The stitched image segmentation method of claim 1, wherein the semantic segmentation model is obtained by:

building an initial semantic segmentation model;

acquiring a first sample splicing map, wherein the first sample splicing map contains label information of a splicing image;

inputting the first sample mosaic into the initial semantic segmentation model;

processing the second image feature map to obtain a sample segmentation label;

3. The stitched image segmentation method of claim 2, wherein the initial semantic segmentation model comprises a lightweight network, a feature pyramid network, a segmentation header, and a classification algorithm;

the lightweight network is used as an encoder for extracting image features of the spliced image;

4. The method for segmenting the stitched image according to claim 3, wherein the feature fusion of the first image feature map and the output of the fused second image feature map comprises:

5. The stitched image segmentation method of claim 1, wherein the prediction model is obtained by:

building an initial prediction model;

acquiring a second sample mosaic, wherein the second sample mosaic comprises position information and width and height information of each single image and a mosaic mode of the second sample mosaic;

inputting the second sample mosaic into the initial prediction model;

performing second loss value calculation on the second sample splicing diagram according to a preset loss function to generate second loss value change data, wherein the second loss value change data is a loss value data collection counted during each training of the initial prediction model;

6. The method of segmenting a stitched image according to claim 5, wherein the training of the predictive model is performed by using the single-image position information as training data and the training of the stitching pattern as a label.

7. The stitched image segmentation method of any one of claims 1 to 6, wherein the extracting the contour information of the segmentation labels comprises:

8. A mosaic image segmentation device based on semantic segmentation is characterized by comprising:

9. A spliced image segmentation device based on semantic segmentation is characterized by comprising:

the memory is a transient memory or a persistent memory;

the central processor is configured to communicate with the memory and execute the instruction operations in the memory to perform the stitched image segmentation method of any one of claims 1 to 7.

10. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the stitched image segmentation method of any one of claims 1 to 7.