CN113780532B

CN113780532B - Training method, device, equipment and storage medium of semantic segmentation network

Info

Publication number: CN113780532B
Application number: CN202111064182.5A
Authority: CN
Inventors: 郑喜民; 陈振宏; 舒畅; 陈又新
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-10
Filing date: 2021-09-10
Publication date: 2023-10-27
Anticipated expiration: 2041-09-10
Also published as: CN113780532A; WO2023035535A1

Abstract

The application relates to the field of artificial intelligence technology and smart cities, and particularly discloses a training method, device and equipment of a semantic segmentation network and a storage medium, wherein the method comprises the following steps: acquiring a sample image and interest point data of the sample image; aligning the interest point data with the sample image to obtain an aligned sample image; determining seed points in the sample image; generating a seed cue according to the aligned sample image and the seed point; and inputting the seed clue and the sample image into a convolution network, training the convolution network by utilizing a seed loss function, and taking the trained convolution network as a semantic segmentation network. The recognition accuracy of the semantic segmentation network can be improved.

Description

Training method, device, equipment and storage medium of semantic segmentation network

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a training method, apparatus, device, and storage medium for a semantic segmentation network.

Background

When constructing a smart city and planning and managing the city, the distribution and variation of buildings in the city need to be monitored. Therefore, the remote sensing image can be acquired and then analyzed, so that the automatic update of the building in the database is realized. However, when the semantic segmentation network is used for segmenting the image, the shape of the building is changeable and the distribution is irregular, so that the recognition difficulty is high and the recognition accuracy is insufficient when the semantic segmentation network is used for segmenting the image of the building.

Disclosure of Invention

The application provides a training method, device and equipment of a semantic segmentation network and a storage medium, so as to improve the recognition accuracy of the semantic segmentation network.

In a first aspect, the present application provides a training method of a semantic segmentation network, the method comprising:

acquiring a sample image and interest point data of the sample image;

aligning the interest point data with the sample image to obtain an aligned sample image;

determining a seed point in the sample image, the seed point representing a building in the sample image;

generating a seed cue according to the aligned sample image and the seed point;

and inputting the seed clue and the sample image into a convolution network, training the convolution network by utilizing a seed loss function, and taking the trained convolution network as a semantic segmentation network.

In a second aspect, the present application further provides a training apparatus for a semantic segmentation network, the apparatus comprising:

the sample acquisition module is used for acquiring a sample image and interest point data of the sample image;

the data alignment module is used for aligning the interest point data with the sample image to obtain an aligned sample image;

an image processing module for determining seed points in the sample image, the seed points representing buildings in the sample image;

the clue generating module is used for generating a seed clue according to the aligned sample image and the seed point;

and the network training module is used for inputting the seed clue and the sample image into a convolution network, training the convolution network by utilizing a seed loss function, and taking the trained convolution network as a semantic segmentation network.

In a third aspect, the present application also provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement the training method of the semantic segmentation network when executing the computer program.

In a fourth aspect, the present application also provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement a training method for a semantic segmentation network as described above.

The application discloses a training method, a device, equipment and a storage medium of a semantic segmentation network, which are characterized in that a sample image and interest point data of the sample image are obtained, then the interest point data and the sample image are aligned to obtain an aligned sample image, seed points in the sample image are determined, seed clues are generated according to the aligned sample image and the seed points, the seed clues and the sample image are finally input into a convolution network, the convolution network is trained by utilizing a seed loss function, and the trained convolution network is used as the semantic segmentation network. And combining the sample image with the corresponding interest point data, so as to fuse the interest point data as the supplement of the seed points, obtain a seed cue, and finally train the convolution network by utilizing the seed cue and the sample image, thereby improving the identification accuracy of the obtained semantic segmentation network.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a training method of a semantic segmentation network according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of the steps for determining seed points in a sample image provided by an embodiment of the present application;

FIG. 3 is a schematic flow chart of the steps for generating a seed cue provided by an embodiment of the present application;

FIG. 4 is a schematic block diagram of a training apparatus of a semantic segmentation network according to an embodiment of the present application;

fig. 5 is a schematic block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The embodiment of the application provides a training method, a training device, computer equipment and a storage medium of a semantic segmentation network. The training method of the semantic segmentation network can be used for training a semantic segmentation network, so that the semantic segmentation network is utilized to segment images, the segmentation accuracy is improved, and the trained semantic segmentation network can be used for extracting buildings in the images.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flowchart of a training method of a semantic segmentation network according to an embodiment of the present application. According to the training method of the semantic segmentation network, the seed clues are extracted from the sample images, the seed clues are used as image labels of the sample images to train the convolution network, so that the trained semantic segmentation network is obtained, and the segmentation accuracy of the semantic segmentation network is improved.

The following describes the scheme in detail with reference to the trained semantic segmentation network 1 being used for building identification as an example. As shown in fig. 1, the training method of the semantic segmentation network specifically includes: step S101 to step S105.

S101, acquiring a sample image and interest point data of the sample image.

Sample images for training the semantic segmentation network are acquired, wherein the sample images can be remote sensing images or other types of images. And, also obtain the interest point data of sample image, wherein, the interest point data namely POI (Point of Interest) data, represent the geographic object that can abstract as the point in the geographic area that the sample image corresponds to, for example school, bank, restaurant, filling station, hospital, supermarket, etc., have easy acquisition, position accuracy height and include advantages such as information are abundant.

S102, aligning the interest point data with the sample image to obtain an aligned sample image.

And after the interest point data corresponding to the sample image is acquired, aligning the interest point data with the sample image. In a specific implementation process, the alignment of the point of interest data and the sample image refers to aligning the point of interest data with the pixel points in the sample image. Namely, according to the interest point data, the interest points included in the sample image are corresponding to the pixel points in the sample image, and the corresponding relation between the pixel points in the sample image and the interest points is obtained.

Moreover, since not all pixels in the sample image represent a building, it is understood that there may be no corresponding points of interest for some pixels in the sample image.

S103, determining seed points in the sample image.

Seed points in the sample image may be determined by image processing the sample image. Wherein the image processing includes feature extraction and image conversion of the sample image. Wherein the seed points represent the buildings in the sample image, in other words, the pixel points with higher probability of the buildings in the sample image are seed points, and the number of the obtained seed points can be multiple.

In a specific implementation, the sample image may be image processed using an image classification network and CAM to obtain seed points.

In an embodiment, referring to fig. 2, fig. 2 is a schematic flowchart illustrating a step of determining a seed point in a sample image according to an embodiment of the present application, and step S103 includes steps S1031 to S1033.

S1031, classifying the sample image by using a pre-trained classification network, and determining the saliency area of the sample image.

And classifying the sample image by using a pre-trained classification network, so as to obtain a saliency area. Wherein the saliency region refers to the region that the pre-trained classification network primarily sees when classifying the sample image. That is, the inclusion of more feature points in the salient region is an important part for classification. Thus, pre-trained pre-output layer images in the classification network may be acquired to determine the salient regions of the sample images.

Wherein the pre-trained classification network is trained using training images and labels of the training images during training. The training image may be a remote sensing image or other images, and the tag of the training image represents the category of the building included in the training image. In the implementation process, the sample image of the training semantic segmentation network can also be used as a training image of the training classification network to train the classification network.

In one embodiment, the pre-trained sorting network structure may further include a squeeze and fire structure, i.e., an SE module. The structure can enable the convolutional network to mine characteristic dimension information in the network, namely the dependency relationship among channels, so that the reinforcement of the channels containing useful information and the compression of the channels containing useless information are realized, and the quality of the classification network obtained through training is improved. In particular implementations, the pre-trained classification network may be, for example, a SE-ResNet50 network.

S1032, carrying out average pooling on the saliency areas to obtain category thermodynamic diagrams corresponding to the saliency areas.

In the salient region obtained by classifying the sample image, the salient region only indicates that a certain type of building is included in the region, but does not contain specific position information of the building in the image. Thus, the saliency areas may be averaged and pooled to obtain a class thermodynamic diagram corresponding to the saliency areas, the thermodynamic values in the class thermodynamic diagram representing the locations of the distinct areas seen by the pre-trained classification network when classifying.

In the specific implementation process, the saliency areas are subjected to global average pooling, the obtained tensor is input into the full-connection layer for classification, and the weights used by the full-connection layer classification are applied to the tensor subjected to global average pooling, so that the class thermodynamic diagram can be obtained, and the distinguishing areas of different types of buildings are highlighted.

S1033, taking the pixel points with the thermodynamic value larger than a preset thermodynamic value threshold in the class thermodynamic diagram as seed points of the sample image.

The seed point has a great significance in classifying the building, and the higher the thermodynamic value in the class thermodynamic diagram is, the greater the effect of the region in classifying the building is considered, whereas the lower the thermodynamic value in the class thermodynamic diagram is, the smaller the effect of the region in classifying the building is considered, so that the seed point of the sample image can be determined according to the class thermodynamic diagram.

In the implementation process, firstly, the thermal value of each pixel point in the class thermodynamic diagram is determined, in the class thermodynamic diagram, different thermal value areas can be displayed in different colors, and after the thermal value of each pixel point in the class thermodynamic diagram is determined, the seed point can be determined according to a preset thermal threshold.

And comparing the thermal value of each pixel point with a preset thermal threshold value, and taking the pixel point with the thermal value larger than the preset thermal threshold value as a seed point when the thermal value of the pixel point is larger than the preset thermal threshold value.

The seed points of the foreground region, that is to say the seed points which will play a role in the segmentation of the building, are determined by means of the class thermodynamic diagram. In an embodiment, some salient object detection methods may also be utilized to locate seed points of background areas, i.e. areas with low salient values are selected as background.

Under the condition of acquiring the seed points of the background area, the seed points of the foreground area and the seed points of the background area can be stacked together to form a single-channel segmentation mask, and finally, the pixel points with higher thermodynamic values in the thermodynamic diagram are used as the foreground area. Here, the threshold of the thermal value in determining the foreground region may be an empirical value, and for example, a pixel point having a thermal value of 20% in the thermodynamic diagram may be used as the foreground pixel.

In another embodiment, the step of determining the seed point in the sample image may comprise: inputting the sample image into a convolution network to obtain the pixel probability of each pixel point in the sample image output by the convolution network; and determining a seed point in the sample image according to the pixel probability.

The sample image may be input into a convolutional network, resulting in a pixel probability for each pixel in the sample image output by the convolutional network, where the pixel probability for each pixel represents the probability that the pixel is determined to be a building. And then taking the pixel points with the pixel probability exceeding the probability preset threshold value as seed points of the sample image to participate in the next training of the convolutional network.

This is because, in the training process of the convolutional network, as the segmentation accuracy of the convolutional network increases, the seed points determined according to the pixel probabilities of the respective pixel points in the output sample image are also more and more accurate. When the convolutional network is trained, the seed points are used as soft labels of sample images to train the convolutional network, that is to say, the labels of the sample images are dynamic, and the labels corresponding to the obtained sample images are more and more accurate along with the improvement of the segmentation accuracy of the convolutional network, so that the accuracy of the convolutional network obtained by training is further improved by the collaborative promotion method.

S104, generating a seed cue according to the aligned sample image and the seed point.

Since the seed points obtained by performing image processing on the sample image are small and sparse, only the most significant areas exist in the buildings in the sample image, namely, the areas with larger functions in classifying the buildings do not fully cover all the buildings, so that the sample image comprising the interest point data, namely, the aligned sample image and the seed points can be used for generating seed clues, the seed points obtained by image processing are combined with the interest point data, the seed points are supplemented, and the number of the obtained seed clues is increased. The resulting seed cues may be used as pseudo-labels for the sample images to co-participate in training the semantic segmentation network.

In an embodiment, referring to fig. 3, fig. 3 is a schematic flowchart of steps for generating a seed cue according to an embodiment of the present application, and step S104 includes step S1041 and step S1042.

S1041, calculating the aligned sample image and the seed points based on a preset cue score calculation formula to obtain a cue score of each pixel point in the sample image.

Because not all pixels in the sample image have corresponding interest point data and whether the pixels in the sample image are seed points or not is also different, the cue score of each pixel can be calculated according to whether the pixels are seed points and whether the pixels have corresponding interest point data.

In one embodiment, the predetermined cue score calculation formula may include:

M(x,y)＝α*C(x,y)+β*P(x,y)

wherein (x, y) represents coordinates of a pixel point in the sample image, M (x, y) represents a cue score of the pixel point (x, y), C (x, y) represents whether the pixel point (x, y) is a seed point, P (x, y) represents whether the pixel point (x, y) has interest point data, and α and β are weight parameters.

If the pixel point (x, y) is a seed point, the value of C (x, y) is 1, and if the pixel point (x, y) is not a seed point, the value of C (x, y) is 0; the pixel point (x, y) has interest point data, the value of P (x, y) is 1, and if the pixel point (x, y) does not have interest point data, the value of P (x, y) is 0; and the values of the weight parameters alpha and beta can be adjusted as required.

And S1042, when the cue score of the pixel point is greater than a preset score threshold, taking the pixel point as a seed cue.

After the cue score of each pixel point in the sample image is calculated, determining whether the cue score of the pixel point is larger than a preset score threshold value, and taking the pixel point with the cue score larger than the preset score threshold value as a seed cue.

S105, inputting the seed clue and the sample image into a convolution network, training the convolution network by using a seed loss function, and taking the trained convolution network as a semantic segmentation network.

And (3) inputting the seed clue serving as a soft label of the sample image into a convolution network, and performing iterative training on the convolution network, so that the trained convolution network is used as a semantic segmentation network. And acts as a supervision message in the form of a seed loss function during training to prevent losses generated by building areas from being submerged in the losses of the background.

In a specific implementation, the seed loss function includes:

where C is the building area in the sample image,is the background area in the sample image, S _c Is a set of pixels classified as class C, H _u,c Representing the probability that a pixel u on the segmented image output by the semantic segmentation network is classified into category c.

In one embodiment, step S105 includes: performing seed growth on the seed clue to obtain a seed region; inputting the seed region and the sample image into a convolution network, training the convolution network by using a seed loss function, and taking the trained convolution network as a semantic segmentation network.

Seed growing is carried out on the seed clues through the similarity standard, so that the seed area is enlarged, and the accuracy of the semantic segmentation network obtained through training is improved. And then training the convolutional network according to the seed region and the sample image. The similarity criterion refers to the similarity between other pixels near the seed cue and pixels in the seed cue. If the similarity between other pixel points near the seed cue and the pixel points in the seed cue is higher, the pixel points can be used as a newly added seed cue to perform seed growth on the seed cue, and the newly added seed cue and the area where the original seed cue is located are used as seed areas.

In an embodiment, the step of seed growing the seed cue to obtain a seed region may include: acquiring pixel probability of each pixel point in the sample image output by the convolution network; determining whether the pixel probability of the adjacent connected pixel points of the seed clues is larger than a preset pixel probability threshold value by taking the seed clues as an increase starting point; when the pixel probability of the connected pixel points is larger than a preset pixel probability threshold, the connected pixel points are used as newly added seed clues, and the newly added seed clues and the area where the seed clues are located are used as seed areas.

In a specific implementation, the foreground threshold may be set for foreground regions in all types of images first, and the background threshold may be set for background regions in all types of images. In addition, the foreground threshold value of the different types of images may be set to the same value, and the background threshold value of the different types of images may be set to the same value, but the values of the foreground threshold value and the background threshold value may not be the same. The seed region may be grown based on the probability of each pixel output by the semantic segmentation network.

The pixel probability of the segmentation generated by the semantic segmentation network for the input sample image is obtained, and the pixel probability represents the probability that the pixel points in the sample image are segmented into buildings. Then, eight connected pixels of the seed clue are accessed, whether the pixel value needs to be included in the seed area is judged according to the foreground threshold value and the background threshold value, and the step is executed circularly until no new pixel point is included in the seed area. The eight connected pixels of the seed cue are eight pixel points adjacent to the seed point with the seed point in the seed cue as the center.

The training method of the semantic segmentation network provided by the embodiment can be applied to the field of smart cities to promote the construction of the smart cities. The method comprises the steps of obtaining sample images and interest point data corresponding to the sample images, aligning the interest point data with the sample images to obtain aligned sample images, determining seed points in the sample images, generating seed clues according to the aligned sample images and the seed points, inputting the seed clues and the sample images into a convolution network, training the convolution network by using a seed loss function, and taking the trained convolution network as a semantic segmentation network. And combining the sample image with the corresponding interest point data, so as to fuse the interest point data as the supplement of the seed points, obtain a seed cue, and finally train the convolution network by utilizing the seed cue and the sample image, thereby improving the identification accuracy of the obtained semantic segmentation network.

Referring to fig. 4, fig. 4 is a schematic block diagram of a training apparatus of a semantic segmentation network according to an embodiment of the present application, where the training apparatus of the semantic segmentation network is configured to perform the foregoing training method of the semantic segmentation network. The training device of the semantic segmentation network can be configured in a server or a terminal.

The server may be an independent server, may be a server cluster, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal can be electronic equipment such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, wearable equipment and the like.

As shown in fig. 4, the training apparatus 200 of the semantic segmentation network includes: a sample acquisition module 201, a data alignment module 202, an image processing module 203, a cue generation module 204, and a network training module 205.

A sample acquisition module 201, configured to acquire a sample image and interest point data of the sample image.

And the data alignment module 202 is configured to align the interest point data with the sample image, and obtain an aligned sample image.

An image processing module 203 for determining seed points in the sample image, the seed points representing buildings in the sample image.

In an embodiment, the image processing module 203 includes an image classification submodule 2031, a region pooling submodule 2032, and a seed determination submodule 2033.

The image classification sub-module 2031 is configured to classify the sample image by using a pre-trained classification network, and determine a salient region of the sample image. The region pooling submodule 2032 is configured to average and pool the saliency region, and obtain a class thermodynamic diagram corresponding to the saliency region. The seed determining submodule 2033 is configured to use a pixel point in the class thermodynamic diagram, where the thermodynamic value is greater than a preset thermodynamic value threshold, as a seed point of the sample image.

And the cue generation module 204 is configured to generate a seed cue according to the aligned sample image and the seed point.

In one embodiment, the thread generation module 204 includes a score computation submodule 2041 and a thread determination submodule 2042.

And a score calculating submodule 2041, configured to calculate the aligned sample image and the seed point based on a preset cue score calculating formula, so as to obtain a cue score of each pixel point in the sample image.

And a clue determining submodule 2042, configured to take the pixel point as a seed clue when the clue score of the pixel point is greater than a preset score threshold.

The network training module 205 is configured to input the seed cue and the sample image into a convolutional network, train the convolutional network by using a seed loss function, and use the trained convolutional network as a semantic segmentation network.

It should be noted that, for convenience and brevity of description, the specific working process of the training device and each module of the semantic segmentation network described above may refer to the corresponding process in the foregoing training method embodiment of the semantic segmentation network, which is not described herein again.

The training means of the semantic segmentation network described above may be implemented in the form of a computer program which may be run on a computer device as shown in fig. 5.

Referring to fig. 5, fig. 5 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server or a terminal.

With reference to FIG. 5, the computer device includes a processor, memory, and a network interface connected by a system bus, where the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause a processor to perform any of a number of training methods for a semantic segmentation network.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of training methods for a semantic segmentation network.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in FIG. 5 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:

acquiring a sample image and interest point data of the sample image;

generating a seed cue according to the aligned sample image and the seed point;

In one embodiment, the processor, when implementing the image processing on the sample image, determines a seed point in the sample image, is configured to implement:

classifying the sample image by using a pre-trained classification network, and determining a significance region of the sample image;

carrying out average pooling on the salient region to obtain a class thermodynamic diagram corresponding to the salient region;

and taking the pixel points with the thermodynamic value larger than a preset thermodynamic value threshold in the class thermodynamic diagram as seed points of the sample image.

In one embodiment, the processor, when implementing the determining the seed point in the sample image, is configured to implement:

inputting the sample image into a convolution network to obtain the pixel probability of each pixel point in the sample image output by the convolution network;

and determining a seed point in the sample image according to the pixel probability.

In one embodiment, the processor, when implementing the generation of the seed cue from the aligned sample image and the seed point, is configured to implement:

calculating the aligned sample image and the seed points based on a preset cue score calculation formula to obtain a cue score of each pixel point in the sample image;

and when the cue score of the pixel point is larger than a preset score threshold value, taking the pixel point as a seed cue.

In one embodiment, the preset cue score calculation formula is:

M(x,y)＝α*C(x,y)+β*P(x,y)

In one embodiment, the processor, when implementing the inputting the seed cue and the sample image into a convolutional network, and training the convolutional network with a seed loss function, is configured to implement:

performing seed growth on the seed clue to obtain a seed region;

inputting the seed region and the sample image into a convolution network, training the convolution network by using a seed loss function, and taking the trained convolution network as a semantic segmentation network.

In one embodiment, when implementing the seed growing on the seed cue according to the similarity, the processor is configured to implement:

acquiring pixel probability of each pixel point in the sample image output by the convolution network;

determining whether the pixel probability of the adjacent connected pixel points of the seed clues is larger than a preset pixel probability threshold value by taking the seed clues as an increase starting point;

when the pixel probability of the connected pixel points is larger than a preset pixel probability threshold, the connected pixel points are used as newly added seed clues, and the newly added seed clues and the area where the seed clues are located are used as seed areas.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, the computer program comprises program instructions, and the processor executes the program instructions to realize the training method of any semantic segmentation network provided by the embodiment of the application.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.

While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method of training a semantic segmentation network, the method comprising:

acquiring a sample image and interest point data of the sample image;

generating a seed cue according to the aligned sample image and the seed point;

inputting the seed clue and the sample image into a convolution network, training the convolution network by utilizing a seed loss function, and taking the trained convolution network as a semantic segmentation network;

wherein the generating a seed cue from the aligned sample image and the seed point comprises:

when the cue score of the pixel point is larger than a preset score threshold, the pixel point is used as a seed cue;

the preset cue score calculation formula is as follows:

wherein ,representing coordinates of pixel points in the sample image, and (2)>Representing pixel dot +.>Is used to calculate the cue score of a (c) line,representing pixel dot +.>Whether it is seed point>Representing pixel dot +.>Whether or not there is interest point data-> and />Is a weight parameter.

2. The method of claim 1, wherein determining seed points in the sample image comprises:

3. The method of claim 1, wherein determining seed points in the sample image comprises:

4. The method of claim 1, wherein the inputting the seed cues and the sample images into a convolutional network and training the convolutional network with a seed loss function comprises:

performing seed growth on the seed clue to obtain a seed region;

5. The method for training a semantic segmentation network according to claim 4, wherein the seed growing the seed cue to obtain a seed region comprises:

6. Training device of a semantic segmentation network, characterized in that it is adapted to implement a training method of a semantic segmentation network according to any one of claims 1 to 5, comprising:

7. A computer device, the computer device comprising a memory and a processor;

the memory is used for storing a computer program;

the processor for executing the computer program and for implementing a training method of a semantic segmentation network according to any one of claims 1 to 5 when the computer program is executed.

8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the training method of the semantic segmentation network according to any one of claims 1 to 5.