CN114445651A

CN114445651A - Training set construction method and device of semantic segmentation model and electronic equipment

Info

Publication number: CN114445651A
Application number: CN202111581998.5A
Authority: CN
Inventors: 朱锦程
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-05-06

Abstract

The invention discloses a training set construction method of a semantic segmentation model, which comprises the following steps: acquiring a plurality of training images of a target scene to establish an unmarked image set; extracting a training image from the unlabeled image set, and processing the training image to obtain a region window of the training image to be labeled; labeling the area window and generating a label; and adding the area window with the label into the training set, and returning to the step of extracting the training image from the unlabeled image set until the training set meets the requirement of a preset training set. According to the method, the regions with high uncertainty are automatically screened and the region windows are generated for marking, the traditional marking of the whole image is replaced, the labor cost and time of manual marking are greatly reduced, the training set construction speed of a network for a specific scene segmentation task is increased, and the training set construction efficiency is improved.

Description

Training set construction method and device of semantic segmentation model and electronic equipment

Technical Field

The invention relates to the technical field of computer vision, in particular to a training set construction method and device of a semantic segmentation model and electronic equipment.

Background

Semantic segmentation is a popular technology in the field of artificial intelligence, and the technology can segment scene categories shown in images or videos at one time through a specific neural network. The result of the division includes not only the category information present in the image but also information such as the position and shape of each category. Such information is now required in many application scenarios, such as autopilot, medical imaging, robotic scene understanding, etc. The application prospect of semantic segmentation is very wide.

High quality semantically segmented data sets are crucial for the development of this field. The existing semantic segmentation data are manually marked, each pixel point on the image is subjected to category marking, and finally a label required by segmentation is obtained, so that a large amount of labor and time are consumed in the manufacturing process. Especially under the requirement of huge data sets, the cost of manual production is very high. The existing active learning method is based on the active strategy of screening the whole image, so that the condition that the category information in most data is not equal is ignored, some category parts which are easy to learn by the network are excessively labeled, and the labor cost is consumed.

Disclosure of Invention

In view of this, the embodiment of the invention provides a training set construction method for a semantic segmentation model, so as to solve the problems of high cost and low efficiency caused by a screening mode based on a whole image in the existing learning method.

In order to achieve the purpose, the invention provides the following technical scheme:

the embodiment of the invention provides a training set construction method of a semantic segmentation model, which comprises the following steps:

acquiring a plurality of training images of a target scene to establish an unmarked image set;

extracting a training image from the unmarked image set, and processing the training image to obtain a region window of the training image to be marked;

labeling the region window and generating a label;

and adding the area window with the label into a training set, and returning to the step of extracting the training image from the unlabeled image set until the training set meets the requirement of a preset training set.

Optionally, the processing the training images to obtain the region window to be labeled of each training image includes:

carrying out feature processing on the training image to obtain a feature region image;

and determining a region window needing to be marked based on the characteristic region image.

Optionally, the performing the feature processing on the training image to obtain a feature region image includes:

extracting a plurality of characteristic point images on the training image through a preset algorithm;

respectively substituting the characteristic point images into a plurality of preset models to obtain pixel classification results of the characteristic point images;

judging the consistency of the pixel classification result;

and selecting the characteristic point image with the consistency judgment result not reaching the preset target as the characteristic area image.

Optionally, the determining, based on the feature region image, a region window that needs to be labeled includes:

screening the pixel point coordinates with the maximum information entropy in the characteristic region image;

and dividing the training image by taking the pixel point coordinates as a central point to obtain an area window needing to be marked.

Optionally, the dividing the training image by using the pixel point coordinates as a center point to obtain a region window to be labeled includes:

acquiring boundary information of the training image;

determining a preselected area according to the pixel point coordinates and a preset window size;

and adjusting the preselected area according to the boundary information to obtain an area window.

Optionally, the labeling the region window and generating a label includes:

adding a region window to be marked into a set to be processed;

and manually marking the region windows in the set to be processed and generating a label.

The embodiment of the invention also provides a training set construction device of the semantic segmentation model, which comprises the following steps:

the acquisition module is used for acquiring a plurality of training images of a target scene to construct an unmarked image set;

the processing module is used for extracting a training image from the unmarked image set and processing the training image to obtain a region window of the training image which needs to be marked;

the marking module is used for marking the area window and generating a label;

and the set module is used for adding the area window with the label into a training set and returning to the step of extracting the training image from the unlabeled image set until the training set reaches the preset precision.

Optionally, the processing module includes:

the characteristic module is used for carrying out characteristic processing on the training image to obtain a characteristic region image;

and the determining module is used for determining the area window needing to be marked based on the characteristic area image.

An embodiment of the present invention further provides an electronic device, including:

the memory and the processor are in communication connection with each other, the memory stores computer instructions, and the processor executes the computer instructions to execute the training set construction method of the semantic segmentation model provided by the embodiment of the invention.

The embodiment of the invention also provides a computer-readable storage medium, which stores computer instructions for enabling the computer to execute the training set construction method of the semantic segmentation model provided by the embodiment of the invention.

The technical scheme of the invention has the following advantages:

the invention provides a training set construction method of a semantic segmentation model, which comprises the steps of establishing an unlabelled image set by acquiring a plurality of training images of a target scene; extracting a training image from the unlabeled image set, and processing the training image to obtain a region window of the training image to be labeled; marking the area window and generating a label; and adding the area window with the label into the training set, and returning to the step of extracting the training image from the unlabeled image set until the training set meets the requirement of a preset training set. According to the method, the areas with larger uncertainty are automatically screened and the area windows are generated for marking, so that the labor cost and time of manual marking are greatly reduced, the training set construction speed of a network for a specific scene segmentation task is increased, and the training set construction efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flowchart of a training set construction method of a semantic segmentation model according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a training set constructing apparatus of a semantic segmentation model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In accordance with an embodiment of the present invention, there is provided an embodiment of a training set construction method for a semantic segmentation model, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

In this embodiment, a training set construction method of a semantic segmentation model is provided, which can be used for training the semantic segmentation model, as shown in fig. 1, the training set construction method of the semantic segmentation model includes the following steps:

step S1: and acquiring a plurality of training images of the target scene to establish an unmarked image set. Specifically, the target scene can be selected according to different customer requirements, the RGB images of the specific target scene are directly collected without sequentially marking the given labels thereof through manual work, the segmentation task processing of different scenes becomes more flexible, and meanwhile, a large amount of time and labor are saved.

Step S2: and extracting a training image from the unlabeled image set, and processing the training image to obtain a region window of the training image to be labeled. Specifically, the processing of feature screening is carried out on the training images, so that the screened areas needing to be labeled are more decisive, and the number of the labeled data required by the segmentation network to achieve the same performance is indirectly reduced again.

Step S3: and labeling the area window and generating a label. Specifically, since the region windows are the image data with poor consistency after being screened, the regions with good consistency are screened out, and the screened region windows with poor consistency are manually marked through the marking interfaces to generate labels of the region windows, so that the workload of manual semantic segmentation is greatly reduced, and the time cost and the labor cost are saved.

Step S4: and adding the area window with the label into the training set, and returning to the step of extracting the training image from the unlabeled image set until the training set meets the requirement of a preset training set. Specifically, the images and labels of the screened region windows are added into a training set, and the processed training images are deleted from the unlabeled image set.

Through the steps S1 to S4, the training set construction method of the semantic segmentation model provided by the embodiment of the present invention automatically screens the regions with large uncertainty and generates the region windows for labeling, thereby greatly reducing the labor cost and time for manual labeling, increasing the training set construction speed of the network for the specific scene segmentation task, and improving the training set construction efficiency.

Specifically, in one embodiment, the method further comprises training the semantic segmentation model, and in the process of constructing the training set of the semantic segmentation model through the steps, the training process of the model can be simultaneously performed until the performance of the semantic segmentation model meets the requirements of a specific task, and the model training is stopped; the performance of the semantic segmentation model meets the requirements, so that the construction of the training set also meets the requirements, and the construction of the training set can be stopped. The process effectively improves the efficiency of model construction, and compared with manual labeling of a whole graph, the manual labeling method has the advantages that labels are generated by screening regions with large uncertainty and performing manual labeling, and the labor cost and time of manual labeling are greatly reduced.

Specifically, in an embodiment, the step S2 includes the following steps:

step S21: and carrying out feature processing on the training image to obtain a feature region image. Specifically, a mobilenetV2 lightweight feature extraction network is preferably adopted, the number of input RGB images is 3, the number of channels is converted into 32 channels through initial convolution, and feature values of a single RGB image are extracted through 17 reverse residual blocks, so that feature extraction accuracy is effectively improved, and a feature region image with high accuracy is obtained.

Step S22: and determining a region window needing to be marked based on the characteristic region image. Specifically, compared with the screening mode of the whole image, the screening mode of the region-level data can greatly reduce the time cost and the workload of marking when the data set is artificially made in the subsequent marking process.

Specifically, in an embodiment, the step S21 includes the following steps:

step S211: and extracting a plurality of characteristic point images on the training image through a preset algorithm. Specifically, the feature point images are extracted, so that the feature recognition precision can be effectively improved, and the efficiency of constructing the training set is improved.

Step S212: and respectively substituting the characteristic point images into a plurality of preset models to obtain pixel classification results of the characteristic point images.

Step S213: and judging the consistency of the pixel classification result.

Step S214: and selecting the characteristic point image of which the consistency judgment result does not reach the preset target as the characteristic area image.

Specifically, the pixels are classified through modes such as gaussian attention convolution, region recommendation algorithm and non-maximum suppression, and accordingly data with poor consistency is selected. The consistency judgment result can effectively reflect the uncertainty distribution of the data, and in the selection process of the region, the pixel points with the highest information entropy (for example, if the same pixel point is substituted into 10 preset models to obtain 10 classification results, 4 of the classification results are classified as A, 3 of the classification results are classified as B, and 3 of the classification results are classified as C, the consistency of the pixel point is poor, the region characteristic is not obvious, the information entropy is high, if 8 of the 10 results are all considered as A, and 2 of the 10 results are considered as B, the consistency is good, the region characteristic is clear and obvious, and the information entropy is low) are used as the region center to perform region division selection later. Compared with the common convolution, the Gaussian attention convolution in the embodiment can better highlight the weight and the information of the central pixel of the area, reduce the influence of the neighborhood pixels on the central pixel point, and prevent the introduction of irrelevant smooth information.

The two-dimensional gaussian distribution is specifically expressed as follows:

the function in the above formula is very similar to the normal distribution density function, so that some function components of python are adopted to generate a group of arrays conforming to the normal distribution, and then the outer product of the matrix is calculated and squared through the outer function in numpy, so as to finally obtain a gaussian kernel attention module conforming to the gaussian distribution. The gaussian kernel attention module listed in the embodiments of the present application is set to a size of 64 × 64, and the standard deviation is set to 2. And a Gaussian attention module is used for replacing a common convolution kernel, so that the screened area tends to occupy less or network learning-difficult category areas, and the problem of category imbalance in the data set is balanced.

Specifically, in an embodiment, the step S22 includes the following steps:

step S221: and screening the coordinates of the pixel points with the maximum information entropy in the characteristic region image. Specifically, the input feature region image is subjected to gaussian attention convolution kernel to obtain a new feature map, and the dimension size depends on the size of the module and the size of the screening window. The feature map is converted into a vector representation and the pixel center screening of the first region window is performed using a function (argmax) that searches for the maximum number index. The specific formula is as follows:

P_index＝argmax(Vectorization(I_M))

wherein, P_indexImage I representing the entire characteristic region_MAnd converting the information image into a vector form by Vectorization (Vectorization) on the pixel point with the maximum information entropy value in the vector, and then finding the pixel point with the maximum information entropy value in the vector through an argmax function. After the pixel point index of the maximum information entropy is obtained, the coordinate of the pixel point is obtained through the following formula:

P_r＝round(P_index/C_M)

P_c＝P_index％C_M

in the above formula, P_rThe abscissa of the pixel point with the maximum representative entropy is obtained by dividing the index of the pixel point by the height of the characteristic graph to obtain the abscissa P_r；P_cThe vertical coordinate of the representative pixel point is obtained by carrying out the residue calculation on the height of the characteristic diagram_c。

Step S222: and dividing the training image by taking the pixel point coordinates as a central point to obtain a region window needing to be marked.

Specifically, in an embodiment, the step S222 specifically includes the following steps:

step S2221: boundary information of the training image is acquired.

Step S2222: and determining a preselected area according to the pixel point coordinates and a preset window size.

Step S2223: and adjusting the preselected area according to the boundary information to obtain an area window.

Specifically, the screened pixel points are used as the central points of the area windows and are divided into preselected areas, and the formula for dividing the windows is as follows:

[W_r1：W_r2]＝I_M[max(0，Pr-W_L)：min(R_M，P_r+W_L)]

[W_c1：W_c2]＝I_M[max(0，P_c-W_L)：min(C_M，P_c+W_L)]

wherein [ W ]_r1：W_r2]Denotes the horizontal region position of the divided region, [ W ]_c1：W_c2]Indicating the longitudinal position of the divided region. W_LIs the size of the division window, R_MRepresenting the transverse maximum boundary of the information profile, C_MRepresents the vertical maximum boundary value of the information characteristic diagram. The above formula ensures that the screened window does not exceed the boundary of the feature map through two functions of max and min. The selected regions can set the values to be 0, so that the next round of reselection is prevented, and the effectiveness and the independence of region screening are ensured.

Specifically, in an embodiment, the step S3 further includes the following steps:

step S31: adding a region window to be marked into a set to be processed;

step S32: and manually labeling the region windows in the set to be processed and generating a label.

Specifically, by screening the region windows with poor consistency and manually marking, the condition that the category information is unequal can be balanced, and a small part of marked samples can obtain higher segmentation precision, so that the cost for manually making the data set is reduced to a certain extent.

The present embodiment further provides a training set constructing apparatus for a semantic segmentation model, where the apparatus is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

The embodiment provides a training set constructing apparatus of a semantic segmentation model, as shown in fig. 2, including:

the obtaining module 101 is configured to obtain a plurality of training images of a target scene to construct an unlabeled image set, and for details, reference is made to the related description of step S1 in the foregoing method embodiment, which is not described herein again.

The processing module 102 is configured to extract a training image from the unlabeled image set, and process the training image to obtain a region window of the training image that needs to be labeled, for details, refer to the related description of step S2 in the foregoing method embodiment, and no further description is given here.

The labeling module 103 is configured to label the region window and generate a label, and for details, refer to the related description of step S3 in the foregoing method embodiment, which is not described herein again.

The set module 104 is configured to add the region window with the label to the training set, and return to the step of extracting the training image from the unlabeled image set until the training set reaches a preset precision, for details, refer to the related description of step S4 in the foregoing method embodiment, and are not described herein again.

Specifically, in an embodiment, the processing module 102 includes:

the feature module 1021 is configured to perform feature processing on the training image to obtain a feature region image, for details, refer to the related description of step S21 in the foregoing method embodiment, and details are not repeated here.

The determining module 1022 is configured to determine, based on the feature region image, a region window that needs to be labeled, for details, refer to the related description of step S22 in the foregoing method embodiment, and details are not repeated here.

The training set constructing apparatus of the semantic segmentation model in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices capable of providing the above functions.

Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.

There is also provided an electronic device according to an embodiment of the present invention, as shown in fig. 3, the electronic device may include a processor 901 and a memory 902, where the processor 901 and the memory 902 may be connected by a bus or in another manner, and fig. 3 takes the example of being connected by a bus as an example.

Processor 901 may be a Central Processing Unit (CPU). The Processor 901 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 902, which is a non-transitory computer readable storage medium, may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods in the method embodiments of the present invention. The processor 901 executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the methods in the above-described method embodiments.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 901, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the processor 901 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 902, which when executed by the processor 901 perform the methods in the above-described method embodiments.

The specific details of the electronic device may be understood by referring to the corresponding related descriptions and effects in the above method embodiments, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, and the program can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A training set construction method of a semantic segmentation model is characterized by comprising the following steps:

labeling the region window and generating a label;

2. The method for constructing a training set of a semantic segmentation model according to claim 1, wherein the processing the training images to obtain a region window to be labeled for each training image comprises:

3. The method for constructing a training set of a semantic segmentation model according to claim 2, wherein the performing feature processing on the training image to obtain a feature region image comprises:

judging the consistency of the pixel classification result;

and selecting the characteristic point image of which the consistency judgment result does not reach the preset target as the characteristic area image.

4. The method for constructing a training set of a semantic segmentation model according to claim 2, wherein the determining a region window to be labeled based on the feature region image comprises:

5. The method for constructing a training set of a semantic segmentation model according to claim 4, wherein the step of dividing the training image by using the pixel coordinates as a center point to obtain a region window to be labeled comprises:

acquiring boundary information of the training image;

6. The method for constructing a training set of a semantic segmentation model according to claim 1, wherein the labeling the region window and generating a label comprises:

adding a region window to be marked into a set to be processed;

7. A training set construction device of a semantic segmentation model is characterized by comprising the following steps:

the marking module is used for marking the area window and generating a label;

8. The apparatus for constructing training set of semantic segmentation model according to claim 7, wherein the processing module comprises:

9. An electronic device, comprising:

a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing therein computer instructions, the processor executing the computer instructions to perform the training set construction method of the semantic segmentation model according to any one of claims 1 to 6.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform the training set construction method of a semantic segmentation model according to any one of claims 1 to 6.