CN113077469B

CN113077469B - Sketch image semantic segmentation method and device, terminal device and storage medium

Info

Publication number: CN113077469B
Application number: CN202110279063.5A
Authority: CN
Inventors: 高成英; 凌鹏; 莫浩然
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2023-01-24
Anticipated expiration: 2041-03-16
Also published as: CN113077469A

Abstract

The invention discloses a method and a device for semantic segmentation of a sketch image, terminal equipment and a storage medium, wherein the method comprises the following steps: acquiring a sketch image; inputting the sketch image into a pre-established intersection point identification model to obtain a first image only containing the intersection points in the sketch image; making a difference between the sketch image and the first image to obtain a second image with the junction in the sketch image removed; organizing the sketch image and the second image into a training sample, and training a preset neural network model based on the training sample to obtain a semantic segmentation model; the sketch image to be segmented is input into the semantic segmentation model to obtain a sketch semantic segmentation result, the problem that pixels on a continuous line are divided into a plurality of classes by using the existing model to cause wrong segmentation can be effectively solved, and the accuracy of image semantic segmentation is greatly improved.

Description

Sketch image semantic segmentation method and device, terminal device and storage medium

Technical Field

The invention relates to the technical field of image semantic segmentation, in particular to a sketch image semantic segmentation method, a sketch image semantic segmentation device, terminal equipment and a storage medium.

Background

The semantic segmentation problem is similar to the classification problem, but semantic segmentation is a pixel-level classification, giving each pixel in the image its class. The current semantic segmentation is mainly segmentation facing natural images, and the semantic segmentation facing sketch is less popular. Since the number of pixels (black pixels) having information in the sketch is small, a processing procedure different from the natural image segmentation (such as preprocessing for ignoring background images) is required for processing, but the main operations are the same. The current sketch-oriented semantic segmentation methods are roughly divided into two categories, namely a traditional method and a deep learning-based method, wherein the existing deep learning-based method basically uses a neural network model (such as a deep lab series model) consisting of convolution and inverse convolution to automatically extract image features and predict the category to which each pixel in the image belongs. The model needs to be trained by using a corresponding sketch image data set (such as the SketchyScene data set) before being used formally, so that the model has the capability of performing semantic segmentation on the sketch image. The existing deep learning-based sketch-oriented semantic segmentation technology can achieve the purpose of automatically extracting features by depending on a neural network, but the following problems obviously exist in the semantic segmentation process: pixels on a continuous line are divided into classes, and all pixels on the continuous line should belong to the same class. However, if the model trained in the prior art is used, all pixels on the sketch line are not divided into the same class, which is equivalent to a continuous sketch line, and are instead divided into two or more segments after passing through the model, thereby greatly affecting the accuracy of the sketch image semantic segmentation.

Disclosure of Invention

The embodiment of the invention provides a sketch image semantic segmentation method, a sketch image semantic segmentation device, terminal equipment and a storage medium, which can effectively solve the problem of error segmentation caused by dividing pixels on a continuous line into a plurality of classes by using the conventional model and greatly improve the accuracy of image semantic segmentation.

An embodiment of the present invention provides a method for semantic segmentation of a sketch image, including:

acquiring a sketch image;

inputting the sketch image into a pre-established intersection point identification model to obtain a first image only containing the intersection points in the sketch image;

making a difference between the sketch image and the first image to obtain a second image with the intersection point in the sketch image removed;

organizing the sketch image and the second image into a training sample, and training a preset neural network model based on the training sample to obtain a semantic segmentation model;

and inputting the sketch image to be segmented into the semantic segmentation model to obtain a sketch semantic segmentation result.

In some embodiments, the intersection identification model is constructed by the following steps, specifically including:

acquiring a plurality of sketch images to be processed filled by Bezier curves with a plurality of random position shapes;

synchronizing the sketch images to be processed to obtain intersection point images with sketch intersection points;

and inputting the junction point image into a preset convolution neural network for training to obtain the junction point identification model.

In some embodiments, the subtracting the sketch image and the first image to obtain a second image without an intersection in the sketch image specifically includes:

acquiring coordinates of the intersection point in the first image;

and setting the pixel value of the position corresponding to the coordinate in the sketch image as a preset pixel value so as to remove the intersection point from the sketch image and obtain a second image.

In some embodiments, the organizing the sketch image and the second image into a training sample specifically includes:

and splicing the sketch image and the second image in a channel dimension to obtain a segmented image which reaches a preset channel dimension threshold value and has continuous sketch lines, and taking the segmented image as a training sample.

In some embodiments, the convolutional neural network is composed of 16 residual blocks, and each residual block internally uses an identity function composed of residual connections.

In some embodiments, the inputting the junction image into a preset convolutional neural network for training to obtain the junction recognition model includes:

training the convolutional neural network using a first loss function, equation (1) of which is as follows:

therein, loss ₁ Is said first loss function, I _J For the purpose of the meeting point image,

and the junction point image is output by the junction point identification model.

In some embodiments, the training a preset neural network model based on the training samples to obtain a semantic segmentation model includes:

training the neural network model using a second loss function, equation (2) of which is as follows:

therein, loss ₂ For the second loss function, y is a segmented image in the training sample,

and outputting the segmentation image for the semantic segmentation model.

Another embodiment of the present invention correspondingly provides a sketch image semantic segmentation apparatus, which includes:

the image acquisition module is used for acquiring a sketch image;

the junction point identification module is used for inputting the sketch image into a pre-established junction point identification model to obtain a first image only containing junction points in the sketch image;

the junction point removing module is used for subtracting the sketch image from the first image to obtain a second image with junction points in the sketch image removed;

the semantic segmentation model training module is used for organizing the sketch image and the second image into a training sample, and training a preset neural network model based on the training sample to obtain a semantic segmentation model;

and the semantic segmentation module is used for inputting the sketch image to be segmented into the semantic segmentation model to obtain a sketch semantic segmentation result.

Another embodiment of the present invention provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the processor implements the sketch image semantic segmentation method described in the above embodiment of the present invention.

Another embodiment of the present invention provides a storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, a device on which the computer-readable storage medium is located is controlled to execute the sketch image semantic segmentation method described in the above embodiment of the present invention.

Compared with the prior art, the sketch image semantic segmentation method disclosed by the embodiment of the invention has the advantages that a sketch image is obtained, the sketch image is input to a pre-established intersection point recognition model, the first image only containing the intersection points in the sketch image is obtained, the sketch image and the first image are subjected to subtraction to obtain the second image without the intersection points in the sketch image, the sketch image and the second image are organized into the training sample, the preset neural network model is trained on the basis of the training sample to obtain the semantic segmentation model, the sketch image to be segmented is input to the semantic segmentation model to obtain the sketch semantic segmentation result, the line continuity of the sketch is considered and is used as the prior knowledge of the training semantic segmentation model, so that the semantic segmentation model cannot divide pixels on continuous lines in the sketch image into multiple classes, the problem of wrong segmentation caused by the fact that the pixels on one continuous line are segmented by using the existing model is effectively solved, the efficiency and the accuracy of the model segmentation can be effectively improved, and the semantic accuracy of the sketch image is greatly improved.

Drawings

FIG. 1 is a flow chart of a method for semantic segmentation of a sketch image according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first stage of a sketch image semantic segmentation method according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating a second stage of the sketch image semantic segmentation method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a sketch image semantic segmentation apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a schematic flow chart of a sketch image semantic segmentation method according to an embodiment of the present invention is shown, where the method includes steps S101 to S105.

S101, acquiring a sketch image.

The sketch image in step S101 may use a sketch data set sketchysene, which is a sketch image data set with various objects and combined with each other.

S102, inputting the sketch image into a pre-established junction identification model, and obtaining a first image only containing junctions in the sketch image.

In the invention, the construction of the intersection point recognition model can be used as the first stage of the method, and the construction of the semantic segmentation model can be used as the second stage of the method. Specifically, referring to fig. 2, a first-stage flow diagram of the sketch image semantic segmentation method provided in an embodiment of the present invention is shown, a data set with a sketch junction is generated, a first-stage neural network model is further constructed, and the first-stage network model is trained, so that a first-stage model (i.e., a junction recognition model) with an ability of recognizing the junction in the sketch is obtained. Further, the first-stage network model is iteratively trained until a preset iteration number is reached, and the capability of the model for identifying the junction can be effectively improved.

Illustratively, 10000 sketch images which are 512x512 in size and filled with a plurality of Bezier curves with random position shapes are synthesized to obtain the intersection point images synchronously, wherein training data with sketch intersection points are needed for obtaining the first-stage model. Therefore, in the embodiment, the multiple to-be-processed sketch images filled with the bezier curves with the shapes of multiple random positions are synchronously processed to obtain the intersection point image with the sketch intersection point, so that the randomness and diversity of the model training sample are improved, and the accuracy of the model identification is improved.

In a preferred embodiment, the convolutional neural network is composed of 16 residual blocks, and each residual block uses an identity function formed by connecting residual errors, so that the problem of gradient disappearance is alleviated, a network model can be well trained even when becoming deep, and the efficiency of model training and the accuracy of the model are improved.

In some embodiments, the inputting the junction image into a preset convolutional neural network for training to obtain the junction identification model includes:

and the first loss function is specifically an L2 loss function for the junction point image output by the junction point identification model.

Further, referring to fig. 3, which is a second-stage flowchart of the sketch image semantic segmentation method provided in an embodiment of the present invention, the input sketch from the sketchysene in step S101 is obtained, and the first-stage model is used to detect the intersection points existing in the input sketch, so as to obtain an image only including the intersection points in the input sketch, that is, the sketch image is input to the intersection point recognition model trained in the first stage, so as to obtain a first image only including the intersection points in the sketch image.

S103, making a difference between the sketch image and the first image to obtain a second image with the junction points in the sketch image removed.

In some embodiments, step S103 specifically includes:

acquiring coordinates of the junction point in the first image;

Referring to fig. 3, the junction in the input sketch is removed according to the junction image (i.e., the first image), so as to obtain a sketch in which the junction in the input sketch is removed. More specifically, coordinates of an intersection (black pixel) in the first image are obtained, a pixel value of a position corresponding to the coordinates in the sketch image is set to be 255, the coordinates are changed into white pixels, and through the above operation, the second image without the intersection in the current sketch image is obtained, wherein the image may include a large number of white pixels (including an originally black intersection). At this time, a continuous and complete sketch line is displayed in the second image, and the line is used as the prior knowledge of the second-stage model training, so that the model can be guided during model training: all pixels contained in this segment of the continuous line should belong to the same class. Therefore, the line continuity of the sketch is considered, and the accuracy of the model can be effectively improved.

S104, organizing the sketch image and the second image into a training sample, and training a preset neural network model based on the training sample to obtain a semantic segmentation model.

In some embodiments, said organizing said sketch image and said second image into a training sample comprises:

Illustratively, the sketch image and the second image are spliced in the channel dimension, so that a segmented image which is doubled in channel dimension and contains priori knowledge is obtained after splicing, and the semantic segmentation model is trained.

Referring to fig. 3, the input sketch and the sketch (i.e., the second image) without the intersection are spliced in the channel dimension to obtain the supervised data with continuity prior knowledge added and channel dimension doubled, and the supervised data with continuity prior knowledge added is used to train the second stage model, so as to obtain the semantic segmentation model combining with the sketch continuity. In addition, in order to further improve the accuracy of the semantic segmentation model, the operation steps of the second stage are repeated until the preset operation times are reached. Therefore, the invention combines the semantic segmentation model facing the sketch with the continuity characteristic implied by the sketch, and the prediction result of the model does not divide the pixels on the continuous lines in the sketch into a plurality of classes, but predicts the pixels on the continuous lines into the same class.

and the second loss function is a cross entropy loss function for the segmentation image output by the semantic segmentation model.

And S105, inputting the sketch image to be segmented into the semantic segmentation model to obtain a sketch semantic segmentation result.

According to the sketch image semantic segmentation method provided by the embodiment of the invention, a sketch image is obtained, the sketch image is input into a pre-established intersection point identification model, a first image only containing the intersection points in the sketch image is obtained, the sketch image and the first image are subjected to subtraction to obtain a second image without the intersection points in the sketch image, the sketch image and the second image are organized into a training sample, a preset neural network model is trained based on the training sample to obtain a semantic segmentation model, the sketch image to be segmented is input into the semantic segmentation model to obtain a sketch semantic segmentation result, and thus, the line continuity of the sketch is considered and is used as the priori knowledge of the training semantic segmentation model, so that the semantic segmentation model does not divide the pixels on the continuous lines in the sketch image into multiple classes, the problem of wrong segmentation caused by the fact that the pixels on the continuous lines are divided into the multiple classes by using the existing model can be effectively improved, the efficiency and the accuracy of the model can be greatly improved.

Referring to fig. 4, which is a schematic structural diagram of a sketch image semantic segmentation apparatus according to an embodiment of the present invention, including:

an image obtaining module 201, configured to obtain a sketch image;

an intersection identification module 202, configured to input the sketch image into a pre-established intersection identification model, and obtain a first image that only includes intersections in the sketch image;

the junction removing module 203 is configured to perform a difference between the sketch image and the first image to obtain a second image from which a junction in the sketch image is removed;

the semantic segmentation model training module 204 is configured to organize the sketch image and the second image into a training sample, and train a preset neural network model based on the training sample to obtain a semantic segmentation model;

and the semantic segmentation module 205 is used for inputting the sketch image to be segmented into the semantic segmentation model to obtain a sketch semantic segmentation result.

Preferably, the apparatus further comprises:

the device comprises a to-be-processed sketch image acquisition module, a processing module and a processing module, wherein the to-be-processed sketch image acquisition module is used for acquiring a plurality of to-be-processed sketch images filled by a plurality of Bezier curves with random position shapes;

the synchronization module is used for synchronizing the sketch image to be processed to obtain an intersection point image with a sketch intersection point;

and the junction point identification model training module is used for inputting the junction point image into a preset convolutional neural network for training to obtain the junction point identification model.

Preferably, the junction removal module 203 includes:

an intersection coordinate acquiring unit, configured to acquire coordinates of the intersection in the first image;

and the junction point removing unit is used for setting the pixel value of the position corresponding to the coordinate in the sketch image as a preset pixel value so as to remove the junction point from the sketch image and obtain a second image.

Preferably, the semantic segmentation model training module 204 includes:

and the image splicing unit is used for splicing the sketch image and the second image in a channel dimension to obtain a segmented image which reaches a preset channel dimension threshold value and has continuous sketch lines, and the segmented image is used as a training sample.

Preferably, the junction recognition model training module includes:

a first training unit, configured to train the convolutional neural network using a first loss function, where equation (1) of the first loss function is as follows:

therein, loss ₁ Is said first loss function, I _J In order to be the image of the junction point,

and the intersection point image is output by the intersection point identification model.

Preferably, the semantic segmentation model training module 204 includes:

a second training unit, configured to train the neural network model using a second loss function, where equation (2) of the second loss function is as follows:

and outputting the segmentation image for the semantic segmentation model.

According to the sketch image semantic segmentation device provided by the embodiment of the invention, a sketch image is obtained, the sketch image is input into a pre-established intersection point identification model, a first image only containing the intersection points in the sketch image is obtained, the sketch image and the first image are subjected to subtraction to obtain a second image without the intersection points in the sketch image, the sketch image and the second image are organized into a training sample, a preset neural network model is trained based on the training sample to obtain a semantic segmentation model, the sketch image to be segmented is input into the semantic segmentation model to obtain a sketch semantic segmentation result, and thus, the line continuity of the sketch is considered and is used as the priori knowledge of the training semantic segmentation model, so that the semantic segmentation model does not divide the pixels on the continuous lines in the sketch image into multiple classes, the problem of wrong segmentation caused by the fact that the pixels on the continuous lines are divided into the multiple classes by using the existing model can be effectively improved, the efficiency and the accuracy of the model can be greatly improved.

The terminal device of this embodiment includes: a processor, a memory, and a computer program, such as a sketch image semantic segmentation program, stored in the memory and executable on the processor. And when the processor executes the computer program, the steps in the various sketch image semantic segmentation method embodiments are realized. Alternatively, the processor implements the functions of the modules/units in the above device embodiments when executing the computer program.

Illustratively, the computer program may be partitioned into one or more modules/units, stored in the memory and executed by the processor, to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal device and connects the various parts of the whole terminal device using various interfaces and lines.

The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the terminal device integrated module/unit can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that the above-described embodiments of the apparatus are merely illustrative, where the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A sketch image semantic segmentation method is characterized by comprising the following steps:

acquiring a sketch image;

inputting the sketch image into a pre-established intersection point identification model to obtain a first image only containing an intersection point in the sketch image; the construction of the junction identification model comprises the following steps: acquiring a plurality of sketch images to be processed filled by Bezier curves with a plurality of random position shapes; synchronizing the sketch images to be processed to obtain intersection point images with sketch intersection points; inputting the junction point image into a preset convolutional neural network for training to obtain the junction point identification model;

making a difference between the sketch image and the first image to obtain a second image with the junction in the sketch image removed;

2. The sketch image semantic segmentation method according to claim 1, wherein the obtaining a second image from which an intersection point in the sketch image is removed by performing a difference between the sketch image and the first image specifically comprises:

acquiring coordinates of the junction point in the first image;

3. The sketch image semantic segmentation method as claimed in claim 1, wherein the organizing the sketch image and the second image into a training sample specifically comprises:

4. The sketch image semantic segmentation method of claim 1, wherein the convolutional neural network is composed of 16 residual blocks, and an identity function formed by connecting residual errors is used in each residual block.

5. The sketch image semantic segmentation method of claim 1, wherein the inputting the junction image into a preset convolutional neural network for training to obtain the junction recognition model comprises:

6. The sketch image semantic segmentation method as claimed in claim 3, wherein the training of a preset neural network model based on the training samples to obtain a semantic segmentation model comprises:

and outputting the segmentation image for the semantic segmentation model.

7. A sketch image semantic segmentation device is characterized by comprising:

the image acquisition module is used for acquiring a sketch image;

the junction point identification module is used for inputting the sketch image into a pre-established junction point identification model to obtain a first image only containing junction points in the sketch image; the construction of the junction identification model comprises the following steps: acquiring a plurality of sketch images to be processed filled by Bezier curves with a plurality of random position shapes; synchronizing the sketch images to be processed to obtain intersection point images with sketch intersection points; inputting the junction point image into a preset convolutional neural network for training to obtain the junction point identification model;

8. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the sketch image semantic segmentation method as claimed in any one of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program runs, the computer-readable storage medium is controlled by a device to execute the sketch image semantic segmentation method according to any one of claims 1 to 6.