CN113470057A

CN113470057A - Semantic segmentation method and device, electronic equipment and computer-readable storage medium

Info

Publication number: CN113470057A
Application number: CN202110725811.8A
Authority: CN
Inventors: 纪德益; 王浩然
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-10-01
Anticipated expiration: 2041-06-29
Also published as: CN113470057B; WO2023273026A1

Abstract

The embodiment of the disclosure provides a semantic segmentation method, a semantic segmentation device, an electronic device and a computer-readable storage medium, wherein the method comprises the following steps: acquiring an image to be processed; adopting a semantic segmentation model to perform semantic segmentation processing on the image to be processed to obtain a semantic segmentation result of the image to be processed; the semantic segmentation model is obtained by taking a first transformation feature which is subjected to contour decomposition or enhancement processing by referring to a first intermediate feature output by the semantic model as a reference and carrying out contour decomposition or enhancement processing on a second transformation feature which is subjected to contour decomposition or enhancement processing by combining a second intermediate feature output by the semantic segmentation model to be trained; the first intermediate feature and the second intermediate feature comprise at least one of the following: a first texture feature and a second texture feature; a first semantic feature and a second semantic feature; the first transformation feature and the second transformation feature comprise at least one of the following groups: a first profile feature and a second profile feature; a first enhancement feature and a second enhancement feature.

Description

Semantic segmentation method and device, electronic equipment and computer-readable storage medium

Technical Field

The present disclosure relates to image processing technologies, and in particular, to a semantic segmentation method, apparatus, electronic device, and computer-readable storage medium.

Background

With the development of semantic segmentation technology, knowledge distillation is introduced into semantic segmentation technology; knowledge learned by a complex model can be transferred to a simple model by knowledge distillation, so that the simple model can be conveniently adopted for semantic segmentation in practical application; however, in the process of knowledge transfer, the result of semantic segmentation of the complex model is generally used as response-based knowledge to guide the simple model to learn, and thus, the knowledge transferred to the simple model is not rich enough, resulting in low precision of semantic segmentation of the learned simple model.

Disclosure of Invention

The embodiment of the disclosure provides a semantic segmentation method, a semantic segmentation device, an electronic device and a computer-readable storage medium, which improve the precision of semantic segmentation.

The technical scheme of the disclosure is realized as follows:

the embodiment of the disclosure provides a semantic segmentation method, which includes:

acquiring an image to be processed;

performing semantic segmentation processing on the image to be processed by adopting a semantic segmentation model to obtain a semantic segmentation result of the image to be processed; wherein the content of the first and second substances,

the semantic segmentation model is obtained by taking a first transformation feature which is subjected to contour decomposition or enhancement processing by referring to a first intermediate feature output by the semantic model as a reference and carrying out contour decomposition or enhancement processing on a second transformation feature training by combining a second intermediate feature output by the semantic segmentation model to be trained;

the first intermediate feature and the second intermediate feature comprise at least one of the following:

a first texture feature and a second texture feature;

a first semantic feature and a second semantic feature;

the first transformation feature and the second transformation feature comprise at least one of the following:

a first profile feature and a second profile feature;

a first enhancement feature and a second enhancement feature.

In the method, the reference semantic model is a pre-trained semantic segmentation network; the semantic segmentation model to be trained is a network with the same function as the reference semantic model; the method further comprises the following steps:

respectively extracting the features of the image sample by adopting the reference semantic model and the semantic segmentation model to be trained to obtain the first intermediate feature and the second intermediate feature;

respectively carrying out contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature to obtain a first transformation feature and a second transformation feature;

and training the semantic segmentation model to be trained at least based on the first transformation characteristic and the second transformation characteristic, and determining the semantic segmentation model.

In the above method, the performing feature extraction on the image sample by using the reference semantic model and the to-be-trained semantic segmentation model respectively to obtain the first intermediate feature and the second intermediate feature includes: respectively extracting the features of the image sample by adopting the reference semantic model and the semantic segmentation model to be trained to obtain a first texture feature and a second texture feature; and performing feature extraction on the first texture feature and the second texture feature to obtain a first semantic feature and a second semantic feature.

In the above method, the performing contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature to obtain the first transformed feature and the second transformed feature respectively includes at least one of:

respectively carrying out contour decomposition processing on the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature;

and performing enhancement processing on the first semantic features and the second semantic features to obtain first enhancement features and second enhancement features.

In the above method, the training the semantic segmentation model to be trained based on at least the first transformation feature and the second transformation feature to determine the semantic segmentation model includes: performing loss calculation based on a preset first loss function, the first contour feature and the second contour feature to determine a first loss; determining a second loss based on a preset second loss function, the first enhancement feature and the second enhancement feature; training the semantic segmentation model to be trained based on at least one of the first loss and the second loss, and determining the semantic segmentation model.

In the above method, the training the semantic segmentation model to be trained based on at least the first transformation feature and the second transformation feature to determine the semantic segmentation model includes: respectively performing semantic segmentation prediction on the first enhancement features and the second enhancement features to obtain first semantic segmentation features and second semantic segmentation features; performing loss calculation based on a preset third loss function, the first semantic segmentation feature and the second semantic segmentation feature to determine a third loss; training the semantic segmentation model to be trained based on the third loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss and the second loss and the third loss, and determining the semantic segmentation model.

In the above method, the first texture feature includes: at least one first sub-textural feature; the second texture feature comprises: at least one second sub-texture feature; after the semantic segmentation prediction is performed on the first enhancement feature and the second enhancement feature respectively to obtain a first semantic segmentation feature and a second semantic segmentation feature, the method further includes: determining a first graph inference relationship based on the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature; determining a second graph inference relationship based on the second semantic segmentation feature, the second semantic feature, and the at least one second sub-texture feature; performing loss calculation based on a preset fourth loss function, the first graph reasoning relation and the second graph reasoning relation to determine a fourth loss; training the semantic segmentation model to be trained based on the fourth loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss, the second loss and the third loss and the fourth loss, and determining the semantic segmentation model.

In the above method, the determining a first graph inference relationship based on the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature includes:

determining at least two difference features between the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature based on an output order;

performing correlation processing on the at least two difference characteristics to obtain the correlation degree between the difference characteristics;

and constructing the first graph inference relationship based on the at least two difference features and the correlation degree between the difference features.

In the above method, the forming the first graph inference relationship based on the at least two difference features and the correlation between the difference features includes:

and under the condition that the correlation characteristics between the target differences smaller than or equal to a preset correlation threshold exist in the correlation degrees between the difference characteristics, forming the first graph inference relationship based on the correlation characteristics between the target differences and the at least two difference characteristics.

In the above method, the performing contour decomposition processing on the first texture feature and the second texture feature respectively to obtain a first contour feature and a second contour feature includes:

based on an interlaced scanning factor, filtering the first texture feature and the second texture feature to obtain respective high-pass sub-band and low-pass sub-band;

carrying out directional filtering on the high-pass sub-band to obtain a directional sub-band;

and respectively fusing the low-pass sub-band and the direction sub-band corresponding to the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature, thereby completing contour decomposition processing.

In the above method, the enhancing the first semantic feature and the second semantic feature to obtain a first enhanced feature and a second enhanced feature includes:

performing at least two conversions on the first semantic features and the second semantic features to obtain at least two semantic conversion features corresponding to each other;

performing self-enhancement processing on different semantic transformation characteristics in the at least two semantic transformation characteristics to obtain a correlation matrix;

performing enhancement processing on the correlation matrix and one of the at least two semantic conversion characteristics to obtain a self-enhancement characteristic;

determining respective self-enhanced features as the first enhanced features and the second enhanced features based on respective self-enhanced matrices of the first semantic features and the second semantic features; or fusing the first semantic feature and the second semantic feature with respective self-enhancement features respectively to obtain the first enhancement feature and the second enhancement feature.

The embodiment of the present disclosure provides a semantic segmentation apparatus, including:

the characteristic acquisition module is used for acquiring an image to be processed;

the semantic segmentation module is used for performing semantic segmentation processing on the image to be processed by adopting the semantic segmentation model to obtain a semantic segmentation result of the image to be processed; the semantic segmentation model is obtained by taking a first transformation feature which is subjected to contour decomposition or enhancement processing by referring to a first intermediate feature output by the semantic model as a reference and carrying out contour decomposition or enhancement processing on a second transformation feature training by combining a second intermediate feature output by the semantic segmentation model to be trained;

a first texture feature and a second texture feature;

a first semantic feature and a second semantic feature;

a first profile feature and a second profile feature;

a first enhancement feature and a second enhancement feature.

In some embodiments, the feature extraction module is configured to perform feature extraction on the image sample by using the reference semantic model and the to-be-trained semantic segmentation model respectively to obtain the first intermediate feature and the second intermediate feature; the reference semantic model is a pre-trained semantic segmentation network; the semantic segmentation model to be trained is a network with the same function as the reference semantic model;

the feature processing module is used for respectively carrying out contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature to obtain a first transformation feature and a second transformation feature;

and the training module is used for training the semantic segmentation model to be trained at least based on the first transformation characteristic and the second transformation characteristic to determine the semantic segmentation model.

In some embodiments, the feature extraction module is further configured to perform feature extraction on the image sample by using the reference semantic model and the to-be-trained semantic segmentation model respectively to obtain a first texture feature and a second texture feature; and performing feature extraction on the first texture feature and the second texture feature to obtain a first semantic feature and a second semantic feature.

In some embodiments, the feature processing module is further configured to at least one of:

respectively carrying out contour decomposition processing on the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature; and performing enhancement processing on the first semantic features and the second semantic features to obtain first enhancement features and second enhancement features.

In some embodiments, the training module is further configured to perform a loss calculation based on a preset first loss function, the first contour feature and the second contour feature, and determine a first loss; determining a second loss based on a preset second loss function, the first enhancement feature and the second enhancement feature; training the semantic segmentation model to be trained based on at least one of the first loss and the second loss, and determining the semantic segmentation model.

In some embodiments, the training module is further configured to perform semantic segmentation prediction on the first enhanced feature and the second enhanced feature respectively to obtain a first semantic segmentation feature and a second semantic segmentation feature; performing loss calculation based on a preset third loss function, the first semantic segmentation feature and the second semantic segmentation feature to determine a third loss; training the semantic segmentation model to be trained based on the third loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss and the second loss and the third loss, and determining the semantic segmentation model.

In some embodiments, the first texture feature comprises: at least one first sub-textural feature; the second texture feature comprises: at least one second sub-texture feature; the training module is further configured to perform semantic segmentation prediction on the first enhancement feature and the second enhancement feature respectively to obtain a first semantic segmentation feature and a second semantic segmentation feature, and determine a first graph inference relationship based on the first semantic segmentation feature, the first semantic feature and the at least one first sub-texture feature; determining a second graph inference relationship based on the second semantic segmentation feature, the second semantic feature, and the at least one second sub-texture feature; performing loss calculation based on a preset fourth loss function, the first graph reasoning relation and the second graph reasoning relation to determine a fourth loss; training the semantic segmentation model to be trained based on the fourth loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss, the second loss and the third loss and the fourth loss, and determining the semantic segmentation model.

In some embodiments, the training module is further configured to determine at least two difference features between the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature based on an output order; performing correlation processing on the at least two difference characteristics to obtain the correlation degree between the difference characteristics; and constructing the first graph inference relationship based on the at least two difference features and the correlation degree between the difference features.

In some embodiments, the training module is further configured to, in a case that there is a correlation feature between target differences that is less than or equal to a preset correlation threshold in the correlation between the difference features, construct the first graph inference relationship based on the correlation feature between the target differences and the at least two difference features.

In some embodiments, the feature processing module is further configured to filter the first texture feature and the second texture feature based on an interlace factor to obtain a high-pass sub-band and a low-pass sub-band, respectively; carrying out directional filtering on the high-pass sub-band to obtain a directional sub-band; and respectively fusing the low-pass sub-band and the direction sub-band corresponding to the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature, thereby completing contour decomposition processing.

In some embodiments, the feature processing module is further configured to perform at least two kinds of conversion on the first semantic feature and the second semantic feature to obtain at least two kinds of semantic conversion features corresponding to each other; performing self-enhancement processing on different semantic transformation characteristics in the at least two semantic transformation characteristics to obtain a correlation matrix; performing enhancement processing on the correlation matrix and one of the at least two semantic conversion characteristics to obtain a self-enhancement characteristic; determining respective self-enhanced features as the first enhanced features and the second enhanced features based on respective self-enhanced matrices of the first semantic features and the second semantic features; or fusing the first semantic feature and the second semantic feature with respective self-enhancement features respectively to obtain the first enhancement feature and the second enhancement feature.

An embodiment of the present disclosure provides an electronic device, including:

a memory for storing a computer program;

and the processor is used for realizing the semantic segmentation method when executing the computer program stored in the memory.

The embodiment of the disclosure provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the semantic segmentation method is implemented.

The embodiment of the disclosure has the following beneficial effects:

the disclosed embodiment provides a semantic segmentation method, a semantic segmentation device, an electronic device and a computer-readable storage medium; the semantic segmentation device can transfer the knowledge based on a plurality of features obtained in the semantic segmentation process of the reference semantic model into the semantic segmentation model; the semantic segmentation model learns richer knowledge, so that the semantic segmentation precision when the semantic segmentation model is used for performing semantic segmentation on the image to be processed is improved.

Drawings

FIG. 1a is a schematic flow chart of an alternative semantic segmentation method provided by the embodiment of the present disclosure;

FIG. 1b is a schematic diagram illustrating a training process of an alternative semantic segmentation model provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an alternative semantic segmentation process provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an alternative contour decomposition method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an alternative low-pass filtering method provided by the embodiment of the present disclosure;

FIG. 5a is a schematic diagram of an alternative enhancement process provided by an embodiment of the present disclosure;

FIG. 5b is a schematic diagram of an alternative enhancement process provided by embodiments of the present disclosure;

FIG. 6 is a diagram illustrating the effect of an alternative texture knowledge learning provided by an embodiment of the present disclosure;

fig. 7 is a schematic diagram of a semantic segmentation method according to an embodiment of the disclosure;

fig. 8 is a schematic diagram of a semantic segmentation result of an alternative student network according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a semantic segmentation apparatus according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clearly understood, the present disclosure is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not intended to limit the disclosure.

The present disclosure will be described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the examples provided herein are merely illustrative of the present disclosure and are not intended to limit the present disclosure. In addition, the embodiments provided below are some embodiments for implementing the disclosure, not all embodiments for implementing the disclosure, and the technical solutions described in the embodiments of the disclosure may be implemented in any combination without conflict.

It should be noted that, in the embodiments of the present disclosure, the terms "comprises," "comprising," or any other variation thereof are intended to cover a non-exclusive inclusion, so that a method or apparatus including a series of elements includes not only the explicitly recited elements but also other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, the use of the phrase "including a. -. said." does not exclude the presence of other elements (e.g., steps in a method or elements in a device, such as portions of circuitry, processors, programs, software, etc.) in the method or device in which the element is included.

The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, e.g., U and/or W, which may mean: u exists alone, U and W exist simultaneously, and W exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of U, W, V, and may mean including any one or more elements selected from the group consisting of U, W and V.

For example, the display method provided by the embodiment of the present disclosure includes a series of steps, but the display method provided by the embodiment of the present disclosure is not limited to the described steps, and similarly, the display device provided by the embodiment of the present disclosure includes a series of modules, but the display device provided by the embodiment of the present disclosure is not limited to include the explicitly described modules, and may also include modules that are required to obtain relevant information or perform processing based on the information.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing embodiments of the disclosure only and is not intended to be limiting of the disclosure.

Before further detailed description of the embodiments of the present disclosure, terms and expressions referred to in the embodiments of the present disclosure are explained, and the terms and expressions referred to in the embodiments of the present disclosure are applied to the following explanations.

Knowledge distillation: transferring the knowledge learned by the complex model to the simple model, so that the semantic segmentation precision of the simple model approaches to the complex model; that is, a trained complex model (for example, a reference semantic model) is used as a teacher network, a simple model (for example, a semantic segmentation model to be trained) is used as a student network, and the teacher network guides the students to learn knowledge on the network, so that the trained simple model is obtained. The complex model has a huge structure and high precision, while the simple model has a small quantity and the precision is different from that of the complex model.

It should be noted that knowledge in the related art is generally regarded as response-based knowledge, and is generally widely applied to computer vision applications such as target detection and human posture estimation. In the aspect of semantic segmentation, the semantic segmentation is carried out,

the embodiments of the present disclosure provide a semantic segmentation method, a semantic segmentation apparatus, an electronic device, and a computer-readable storage medium, which can improve the precision of semantic segmentation, where the semantic segmentation method provided by the embodiments of the present disclosure is applied to an electronic device, and an exemplary application of the electronic device provided by the embodiments of the present disclosure is described below.

Referring to fig. 1a, fig. 1a is an alternative flow chart diagram of a semantic segmentation method provided by an embodiment of the present disclosure, which will be described with reference to the steps shown in fig. 1 a.

S101, acquiring an image to be processed;

s102, performing semantic segmentation processing on the image to be processed by adopting a semantic segmentation model to obtain a semantic segmentation result of the image to be processed; the semantic segmentation model is obtained by taking a first transformation feature which is subjected to contour decomposition or enhancement processing by referring to a first intermediate feature output by the semantic model as a reference and carrying out contour decomposition or enhancement processing on a second transformation feature which is subjected to contour decomposition or enhancement processing by combining a second intermediate feature output by the semantic segmentation model to be trained;

the first intermediate feature and the second intermediate feature comprise at least one of the following: a first texture feature and a second texture feature; a first semantic feature and a second semantic feature; the first transformation feature and the second transformation feature comprise at least one of the following groups: a first profile feature and a second profile feature; a first enhancement feature and a second enhancement feature.

In the embodiment of the disclosure, after the semantic segmentation device finishes training the semantic segmentation model, the semantic segmentation device may perform semantic segmentation on the acquired image to be processed by using the semantic segmentation model to obtain a semantic segmentation result. The reference semantic model is a pre-trained semantic segmentation network; the semantic segmentation model to be trained is a network with the same function as the reference semantic model.

In some embodiments of the present disclosure, as shown in fig. 1b, before performing semantic segmentation, a determination of a semantic segmentation model needs to be performed. The semantic segmentation means S01-S03 implement the training process as follows:

s01, respectively extracting the features of the image sample by adopting a reference semantic model and a semantic segmentation model to be trained to obtain a first intermediate feature and a second intermediate feature;

s02, performing contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature respectively to obtain a first transformation feature and a second transformation feature;

and S03, training the semantic segmentation model to be trained based on at least the first intermediate feature and the second intermediate feature, and determining the semantic segmentation model.

In the embodiment of the disclosure, the reference semantic model and the semantic segmentation model to be trained have the same function and are both used for semantic segmentation; the reference semantic model is a complex model which is successfully trained, the semantic segmentation model is a simple model, the reference semantic model is adopted to guide the semantic segmentation model to train, and the knowledge learned by the reference semantic model is transferred to the semantic segmentation model.

In the embodiment of the disclosure, in the process of training the semantic segmentation model to be trained, the semantic segmentation device performs feature extraction on the image sample by using the reference semantic model, so as to obtain the first intermediate feature and the second intermediate feature.

It should be noted that the features involved in the embodiments of the present disclosure may be embodied by a feature diagram, where the feature diagram may be represented by a matrix of C × H × M; h × M represents a pixel of the feature map, and C represents the number of channels of the feature map, that is, the feature map can be regarded as a deep descriptor in C dimension.

In the embodiment of the disclosure, the reference semantic model and the semantic segmentation model to be trained both include a plurality of convolutional layers, wherein a plurality of corresponding intermediate features can be sequentially obtained through the plurality of convolutional layers; the plurality of intermediate features includes a lower level feature and an upper level feature; wherein, the low-level features contain texture information which can be used as texture features; the high-level features contain semantic information, which can be used as semantic features.

In an embodiment of the present disclosure, the first intermediate feature may include: a first texture feature and/or a first semantic feature; the first texture feature is a low-level feature extracted by the reference semantic model, and the first semantic feature is a high-level feature extracted by the reference semantic model.

In an embodiment of the present disclosure, the second intermediate feature may include: a second texture feature and/or a second semantic feature; the second texture feature is a low-level feature extracted by the semantic segmentation model to be trained, and the first semantic feature is a high-level feature extracted by the semantic segmentation model.

In an embodiment of the present disclosure, in the case where the first intermediate feature includes the first texture feature, the second intermediate feature includes the second texture feature; in the case where the first intermediate feature comprises a first semantic feature, the second intermediate feature comprises a second semantic feature.

In the embodiment of the present disclosure, the reference semantic model and the semantic segmentation model to be trained may be: ResNet, ENet, ESPNet, BiSeNet, SegNet, ESPNet, RefineNet, ENet, etc., wherein the reference semantic model and the semantic segmentation model may be the same model or different models; the disclosed embodiments are not limited in this respect.

Illustratively, referring to fig. 2, the reference semantic model includes 4 convolutional layers and a decoder, and the semantic segmentation result 23 is obtained by performing semantic segmentation on the original image 20 through the reference semantic model. The features extracted from the first three convolutional layers are low-level features, as shown in 21, containing a large amount of texture information, and the features extracted from the 4 th convolutional layer are high-level features, as shown in 22, containing semantic information.

In this embodiment of the present disclosure, if the first intermediate feature includes a first texture feature and the second intermediate feature includes a second texture feature, the semantic segmentation apparatus may perform contour decomposition on the first intermediate feature to obtain a first contour feature; carrying out contour decomposition on the second intermediate features to obtain second contour features; the first transformation feature comprises a first contour feature and the second transformation feature comprises a second contour feature.

In this embodiment of the present disclosure, the semantic segmentation apparatus may perform contour decomposition processing on the first texture feature and the second texture feature, decompose the first texture feature into at least one first band-pass sub-band and a first low-pass sub-band, and obtain a first contour feature based on the at least one first band-pass sub-band and the first low-pass sub-band; and decomposing the second texture feature into at least one second band-pass sub-band and a second low-pass sub-band, and obtaining a second contour feature based on the at least one second band-pass sub-band and the second low-pass sub-band.

In the embodiment of the present disclosure, the semantic segmentation apparatus may perform fusion processing on at least one first band-pass sub-band and the first low-pass sub-band to obtain a first contour feature; the semantic segmentation device may perform fusion processing on the at least one second band-pass sub-band and the second low-pass sub-band to obtain a second contour feature.

In an embodiment of the present disclosure, the semantic segmentation apparatus may include a low-pass filter and a direction filter, and the contour decomposition may be performed on the first texture feature and the second texture feature by the low-pass filter and the direction filter.

In the embodiment of the present disclosure, the semantic segmentation apparatus may perform at least one level of decomposition on the first texture feature and the second texture feature by using a laplacian pyramid decomposition.

In this embodiment of the present disclosure, if the first intermediate feature includes a first semantic feature and the second intermediate feature includes a second semantic feature, the semantic segmentation apparatus may perform enhancement processing on the first intermediate feature to obtain a first enhanced feature; and performing enhancement processing on the second intermediate features to obtain second enhanced features; the first transformed feature comprises a first enhanced feature and the second transformed feature comprises a second enhanced feature.

In this embodiment of the present disclosure, the semantic segmentation apparatus may perform enhancement processing on the first semantic feature and the second semantic feature to obtain a first enhancement feature that can represent the correlation of pixels in the first semantic feature and a second enhancement feature that can represent the correlation of pixels in the second semantic feature.

In some embodiments of the present disclosure, the semantic segmentation apparatus may train an attention model in advance, and implement enhancement processing through the attention model; here, the attention model may be a common attention model, a multi-level attention model, an intrinsic attention model, or the like, and this may be set as needed, and the embodiment of the present disclosure is not limited.

In some embodiments of the present disclosure, the semantic segmentation apparatus may also determine a feature matrix of the first enhanced feature based on the feature matrix of the first semantic feature; and determining a feature matrix of the second enhanced feature based on the feature matrix of the second semantic feature.

In the embodiment of the present disclosure, after obtaining the first intermediate feature and the second intermediate feature, the semantic segmentation device may train the semantic segmentation model to be trained based on the first intermediate feature and the second intermediate feature, and after the training is successful, obtain the semantic segmentation model.

In some embodiments of the present disclosure, the semantic segmentation means may determine a feature loss between the first intermediate feature and the second intermediate feature; the feature loss is used to characterize a difference between the first intermediate feature and the second intermediate feature; the semantic segmentation device can train the semantic segmentation model to be trained according to the characteristic loss, and stop the training to obtain the semantic segmentation model under the condition that the characteristic loss is smaller than a characteristic loss threshold value.

In the disclosed embodiment, where the first and second intermediate features comprise first and second outline features, the feature loss comprises a first loss; the first loss is used to characterize a difference between the first profile feature and the second profile feature; in the case where the first intermediate feature and the second intermediate feature comprise a first semantic feature and a second semantic feature, the feature loss comprises a second loss; the second loss is used to characterize a difference between the first enhancement feature and the second enhancement feature.

In a disclosed embodiment, the differences between features may be characterized by a vector distance; here, the vector distance may be a cosine distance or an euclidean distance, and the embodiment of the present disclosure is not limited thereto.

In some embodiments of the present disclosure, the semantic segmentation apparatus may train the semantic segmentation model to be trained based on at least one of a third loss, a fourth loss, a response loss, and a training loss, and the feature loss. Wherein the response loss characterizes a difference between the first semantic segmentation result and the second semantic segmentation result; the first semantic segmentation result and the second semantic segmentation result are respectively obtained by performing semantic segmentation on the image sample by using a reference semantic model and a semantic segmentation model to be trained; a third loss characterizes a difference between the first semantic segmentation feature and the second semantic segmentation feature; the first semantic segmentation feature and the second semantic segmentation feature are respectively extracted features of a reference semantic model and a pooling layer of a semantic segmentation model to be trained; the pooling layer is a feature extraction layer behind the high-layer convolution layer; the fourth loss is used for representing the loss between the first relation characteristic and the second relation characteristic, wherein the first relation characteristic represents the relation of a plurality of characteristics in the first intermediate characteristic and the first semantic segmentation characteristic, and the second relation characteristic represents the relation of a plurality of characteristics in the second intermediate characteristic and the second semantic segmentation characteristic; the relationship of the plurality of features can be characterized by vector similarity; the training loss characterizes a difference between the second semantic segmentation result and the image sample.

In some embodiments of the present disclosure, the semantic segmentation apparatus may set a response loss threshold, a third loss threshold, a fourth loss threshold, and a training loss threshold, respectively; in this way, the semantic segmentation apparatus may stop the training to obtain the semantic segmentation model when the feature loss and at least one of the third loss, the fourth loss, the response loss, and the training loss are less than the corresponding loss threshold.

In some embodiments of the present disclosure, the semantic segmentation apparatus may perform weighted summation on at least one of the third loss, the fourth loss, the response loss, and the training loss, and the feature loss to obtain the semantic loss; and under the condition that the semantic loss is smaller than the semantic loss threshold value, stopping training to obtain a semantic segmentation model.

It is understood that the semantic segmentation means may extract a first intermediate feature of the reference semantic model, and a second intermediate feature of the semantic segmentation model; the first intermediate feature and the second intermediate feature are subjected to contour decomposition or enhancement processing to obtain a first transformation feature and a second transformation feature, wherein the first transformation feature and the second transformation feature comprise texture knowledge and/or semantic knowledge, so that the semantic segmentation device trains the semantic segmentation device to be trained on the basis of the first transformation feature and the second transformation feature to obtain a semantic segmentation model, the texture knowledge and the semantic knowledge in the reference semantic model can be learned, and the precision of the semantic segmentation of the image to be processed by adopting the semantic segmentation model is improved.

In some embodiments of the present disclosure, the performing, in S01, feature extraction on the image sample by using the reference semantic model and the semantic segmentation model to be trained to obtain the first intermediate feature and the second intermediate feature may include:

s201, feature extraction is carried out on the image sample by adopting a reference semantic model and a semantic segmentation model to be trained respectively to obtain a first texture feature and a second texture feature.

S202, extracting the first texture feature and the second texture feature to obtain a first semantic feature and a second semantic feature.

In the embodiment of the disclosure, after the semantic segmentation device extracts the first texture feature by referring to the semantic model, feature extraction can be continuously performed on the first texture feature to obtain the first semantic feature; after the second texture feature is extracted through the semantic segmentation model to be trained, feature extraction can be continuously carried out on the second texture feature to obtain a second semantic feature.

In the embodiment of the disclosure, the semantic segmentation model to be trained and the reference semantic model include multiple convolutional layers, and the multiple convolutional layers can obtain multiple intermediate features, wherein the first convolutional layer performs feature extraction on an image sample to obtain a first layer of intermediate features, the second convolutional layer performs feature extraction on the first layer of intermediate features to obtain a second layer of intermediate features, and so on, the multiple intermediate features are obtained.

In embodiments of the present disclosure, the multilayer convolutional layers may include at least one lower convolutional layer and one upper convolutional layer; at least one intermediate feature obtained by at least one lower convolutional layer is a lower layer feature, i.e., a first texture feature, and one intermediate feature obtained by one higher convolutional layer is a higher layer feature, i.e., a first semantic feature. That is, the semantic segmentation model needs to acquire the first texture feature before acquiring the first semantic feature.

It can be understood that the semantic segmentation device performs feature extraction on the image sample by referring to the semantic model and the multilayer convolution layer in the semantic segmentation model to be trained, and can sequentially obtain the first texture feature and the first semantic feature.

In some embodiments of the present disclosure, the performing contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature in S02 to obtain the implementation of the first transformed feature and the second transformed feature may include at least one of: respectively carrying out contour decomposition processing on the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature; and performing enhancement processing on the first semantic features and the second semantic features to obtain first enhancement features and second enhancement features.

In an embodiment of the present disclosure, in the case where the first intermediate feature includes the first texture feature, the second intermediate feature includes the second texture feature; the semantic segmentation device can respectively carry out contour decomposition processing on the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature, wherein the first contour feature is used as a first transformation feature, and the second contour feature is used as a second transformation feature; in the case that the first intermediate feature comprises a first semantic feature, the second intermediate feature comprises a second semantic feature; the semantic segmentation device can respectively perform enhancement processing on the first semantic features and the second semantic features to obtain first enhancement features and second enhancement features; in the case where the first intermediate feature includes the first texture feature and the first semantic feature, the second intermediate feature includes the second texture feature and the second semantic feature; the semantic segmentation device can respectively carry out contour decomposition processing on the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature; and respectively performing enhancement processing on the first semantic features and the second semantic features to obtain first enhancement features and second enhancement features.

In this embodiment of the present disclosure, the performing, by the semantic segmentation apparatus, contour decomposition processing on the first texture feature and the second texture feature, respectively, to obtain the first contour feature and the second contour feature may include: S301-S302.

S301, carrying out contour decomposition processing on the first texture feature to obtain a first contour feature.

In this embodiment of the disclosure, the semantic segmentation apparatus may perform contour Decomposition on the first texture feature through a Contour Decomposition Module (CDM) to obtain the first contour feature.

In an embodiment of the present disclosure, the contour wave decomposition module includes a combination of at least one set of a Low-pass Filter (Low-pass Filter) and a Directional Filter (DFB); the low-pass filter is used for filtering the input features and decomposing the input features into a high-pass sub-band and a low-pass sub-band; the directional filter is used for performing directional filtering on the high-pass sub-band to obtain a directional sub-band; thus, each combination of low pass filter and directional filter can implement a laplacian pyramid decomposition.

In some embodiments of the present disclosure, based on an interlaced scanning factor, filtering both the first texture feature and the second texture feature to obtain a high-pass sub-band and a low-pass sub-band; carrying out directional filtering on the high-pass sub-band to obtain a directional sub-band; and respectively fusing the low-pass sub-band and the direction sub-band corresponding to the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature, thereby completing contour decomposition processing.

It should be noted that, in the embodiment of the present disclosure, each group of LP and DFB may perform one-level contour decomposition, and the embodiment of the present disclosure does not limit the level limitation in contour decomposition.

For example, in this embodiment of the present disclosure, performing contour decomposition processing on the first texture feature in S301 to obtain implementation of the first contour feature may include: S3011-S3012.

S3011, performing at least one level of contour decomposition on the first texture feature through at least one level of LP and DFB combination to obtain at least one directional sub-band and one low-pass sub-band.

In the embodiment of the present disclosure, after the semantic segmentation apparatus performs at least one level of decomposition on the first texture feature by using CDM, at least one directional sub-band and one low-pass sub-band may be obtained; at least one directional sub-band is a directional sub-band obtained by at least one corresponding level of decomposition, and one low-pass sub-band is a low-pass sub-band obtained by the last level of decomposition. The CDM decomposes the first texture feature, see equation (1).

Wherein ↓isa sampling operator, and p is an interlaced scanning factor; f_l,nRepresenting the nth level low-pass sub-band (characteristic), namely the low-pass sub-band (characteristic) obtained by the nth level decomposition; n is an element of [1, m ∈]And m is the number of sets of LP and DFB combinations in the contourlet decomposition module. As can be seen from equation (1), F after downsampling processing is performed by LP_l,nDecomposing to obtain the n +1 level low-pass sub-band characteristic F_l,n+1And the (n + 1) th high-pass subband characteristic F_h,n+1(ii) a Then the n +1 level low-pass sub-band characteristic F is processed by DFB_l,n+1Performing directional filtering to obtain n + 1-level directional subband characteristic F_bds,n+1。

In the 1 st decomposition, the first texture feature is decomposed through LP to obtain a 1 st-level low-pass sub-band feature F_l,1And a level 1 higher pass sub-band feature F_h,1。

In the embodiment of the present disclosure, the DFB includes a k-level binary tree, the input features are decomposed by the k-level binary tree, and the obtained directional subband features include 2^kAnd (4) sub-bands of directions. For example, if k is 3, the directional subband features are 8: 0, 1, … … 7, wherein 0-3 are vertical features and 4-7 are horizontal features.

In the disclosed embodiment, the number of intermediate features in the first texture feature is the same as the number of CDM in the contour decomposition module; that is, each intermediate feature in the first texture feature needs to use one CDM.

Illustratively, referring to FIG. 3, CDM includes a combination of 2 groups of low pass filters LP and directional filters DFB, and group 1 includes LP₁And DFB₁Group 2 includes LP₂And DFB₂(ii) a After the texture feature F is input into CDM, a low-pass sub-band F can be obtained_l,2Two directional subbands F_bds,1And F_bds,2. Wherein, by LP₁After the texture feature F is subjected to low-pass filtering, a 1 st high-pass sub-band F can be obtained_h,1And a level 1 low-pass sub-band F_l,1(ii) a By DFB₁For the 1 st high-pass sub-band F_h,1Performing directional filtering to obtain a 1 st level directional sub-band F_bds,1(ii) a The 1 st level low-pass sub-band F is processed according to the (2,2)_l,1Down-sampling is carried out to obtain a 1 st level low-pass sub-band F after down-sampling_l,1The length and width of J are both the level 1 low-pass sub-band F _l,11/2 of (1). By LP₂To the downsampled level 1 low-pass sub-band F_l,1-J, low-pass filtering to obtain a level 2 high-pass sub-band F_h,2And a level 2 low-pass sub-band F_l,2(ii) a By DFB₂For the 2 nd high-pass sub-band F_h,2Performing directional filtering to obtain a 2 nd level directional sub-band F_bds,2。DFB₁Comprising a 4-level binary tree, F_bds,1Comprises 16 directional sub-bands; DFB₂Comprising a 3-level binary tree, F_bds,1Comprising 8 directional subbands.

It should be noted that, the semantic segmentation apparatus may be implemented by decomposing the nth-level low-pass sub-band through LP, and may include: carrying out low-pass analysis filtering on the nth-level low-pass sub-band through a low-pass analysis filter to obtain an n + 1-level low-pass result; then, down-sampling the (n + 1) th level low-pass result to obtain an (n + 1) th level low-pass sub-band; and then the (n + 1) th level low-pass sub-band is subjected to up-sampling, the (n + 1) th level low-pass sub-band subjected to up-sampling is subjected to a synthesis filter to obtain an (n + 1) th level low-pass result, and the (n + 1) th level high-pass sub-band is obtained based on the (n + 1) th level low-pass sub-band and the (n + 1) th level low-pass result.

In some embodiments of the disclosure, the semantic segmentation apparatus may subtract the (n + 1) th level low-pass result from the (n + 1) th level low-pass sub-band to obtain the (n + 1) th level high-pass sub-band.

In this embodiment of the present disclosure, the semantic segmentation apparatus may calculate a difference between the nth-level low-pass sub-band and the (n + 1) th-level low-pass result according to the elements, so as to obtain the (n + 1) th-level high-pass sub-band.

Illustratively, referring to fig. 4, the low-pass filter includes a low-pass analysis filter 41, a down-sampling module 42, an up-sampling module 43, a synthesis filter 44, and a subtraction module 45; wherein the nth low-pass sub-band F_l,nThe low-pass analysis filter 41 is inputted to obtain the n +1 stage low-pass result F_l-n+1(ii) a The down-sampling module 42 performs down-sampling processing on the (n + 1) th level low-pass result to obtain an (n + 1) th level low-pass result after down-sampling, which is used as an (n + 1) th level low-pass sub-band; the up-sampling module 43 up-samples the down-sampled (n + 1) th low-pass result to obtain (n + 1) th low-pass result F_l-n+1(ii) a The n +1 th stage low pass result F_l-n+1Input into the synthesis filter 44 to obtain the n +1 th low-pass result F after synthesis filtering_l-n+1Finally, the nth low-pass sub-band F is passed through a subtraction block 45_l,nSubtracting the n +1 th-order low-pass result F after synthesis filtering_l-n+1The n +1 th high-pass sub-band F can be obtained_h,n+1。

S3012, performing feature fusion on at least one direction sub-band and one low-pass sub-band to obtain a first contour feature.

In the embodiment of the disclosure, after the semantic segmentation device decomposes the first texture feature at least once, at least one level of directional sub-band and a last level of low-pass sub-band can be obtained; wherein, the characteristic dimensions of at least one level of directional sub-band and the last level of low-pass sub-band are different, and semantic segmentation is neededThe dimensionality of at least one level of direction sub-band and the dimensionality of the last level of low-pass sub-band are changed and transformed to be consistent through the pooling layer to obtain at least one conversion direction sub-band and the last level of conversion low-pass sub-band, and then the at least one conversion direction sub-band and the last level of conversion low-pass sub-band are subjected to first fusion processing to obtain a first contour feature F^te

Here, the first fusion process may include: the embodiment of the present disclosure is not limited to adding the at least one transform direction sub-band and the last transform low-pass sub-band, or splicing the at least one transform direction sub-band and the last transform low-pass sub-band.

It should be noted that, the greater the number of CDM decomposition stages, the richer the extracted first contour features are, and the higher the accuracy achieved by the trained student network is, but the higher the calculation amount is. Here, the number of stages of CDM decomposition may be set as needed.

S302, carrying out contour decomposition processing on the second texture feature to obtain a second contour feature.

In the embodiment of the present disclosure, a manner of performing the contour decomposition processing on the second texture feature by the semantic segmentation device is the same as the manner of performing the contour wave decomposition processing on the first texture feature in S301, and details are described in S301, and thus, details are not described herein again.

In this embodiment of the present disclosure, the manner in which the semantic segmentation apparatus performs enhancement processing on the first semantic feature and the second semantic feature to obtain the first enhancement feature and the second enhancement feature may include: S401-S402.

S401, enhancing the first semantic features to obtain first enhanced features.

In this embodiment of the disclosure, after obtaining the first Semantic feature, the Semantic segmentation apparatus may perform enhancement processing on the first Semantic feature through a Semantic Attention Module (SAM) to obtain a first enhanced feature.

In this embodiment of the present disclosure, the implementing process of performing enhancement processing on the first semantic feature and the second semantic feature to obtain the first enhancement feature and the second enhancement feature may include: performing at least two conversions on the first semantic features and the second semantic features to obtain at least two semantic conversion features corresponding to each other; performing self-enhancement processing on different semantic transformation characteristics in at least two semantic transformation characteristics to obtain a correlation matrix; performing enhancement processing on the correlation matrix and one of the at least two semantic conversion characteristics to obtain a self-enhancement characteristic; determining the self-enhancement features as first enhancement features and second enhancement features based on the self-enhancement matrixes corresponding to the first semantic features and the second semantic features respectively; or respectively fusing the first semantic feature and the second semantic feature with respective self-enhancement features to obtain a first enhancement feature and a second enhancement feature.

Illustratively, at least two semantic transformation features are taken as three examples for explanation. In this embodiment of the present disclosure, the enhancing the first semantic feature in S401 to obtain implementation of the first enhanced feature may include: S4011-S4014.

S4011, performing three kinds of conversion on the first semantic features to obtain a first semantic conversion feature, a second semantic conversion feature and a third semantic conversion feature.

In the embodiment of the present disclosure, the semantic segmentation device may perform first transformation, second transformation, and third transformation on the first semantic feature, respectively, to obtain a first semantic transformation feature, a second semantic transformation feature, and a third semantic transformation feature; the number of vectors included in the first semantic conversion feature is equal to the number of channels C; the second semantic transformation feature comprises a number of vectors equal to the number of pixels (H × M).

In some embodiments of the present disclosure, the first semantic transformation feature and the second semantic transformation feature are transpose matrices of each other.

In some embodiments of the disclosure, the first semantic transformation feature and the third semantic transformation feature are the same matrix feature.

S4012, performing matrix multiplication on the first semantic conversion characteristic and the second semantic conversion characteristic to obtain a relevant characteristic; the elements of the matrix of correlation features are used to characterize the correlation coefficients of the pixels.

In the embodiment of the disclosure, the first semantic transformation characteristic and the second semantic transformation moment characteristic are multiplied by matrix multiplication, and the obtained matrix is a relevant characteristic; the elements in the matrix of correlation features may characterize the correlation between pixels; the greater the correlation, the greater the element value; the smaller the correlation, the smaller the element value.

And S4013, multiplying the correlation characteristic and the third semantic conversion characteristic to obtain a self-enhancement characteristic.

In the embodiment of the disclosure, after the correlation feature and the third semantic conversion feature are multiplied, the obtained matrix is a self-enhancement feature; namely, the third semantic conversion characteristic is enhanced through the correlation characteristic, so that the self-enhancement matrix comprises the correlation of the pixels.

S4014, based on the self-enhancement features, determining first enhancement features.

In the embodiment of the disclosure, after obtaining the self-enhanced feature, the semantic segmentation device may determine the first enhanced feature according to the self-enhanced feature.

In some embodiments of the present disclosure, the semantic segmentation means may treat the self-enhancing feature as a first enhancement matrix.

Exemplarily, based on fig. 5a, the first semantic feature matrix is an hxwxc matrix MF, and after performing class-3 transformation on the first semantic feature matrix, a first semantic transformation matrix MF1 of C × (hxw) can be obtained; a second semantic transformation matrix MF2 and a third semantic transformation matrix MF3 of (hxw) × C; thus, the first semantic transformation matrix MF1 and the second semantic transformation matrix MF2 are multiplied to obtain a correlation matrix MFC of C × C, and after the correlation matrix MFC is multiplied by the third semantic transformation matrix MFC, a self-enhancement matrix MFp1 of H × W × C is obtained; thus, the self-enhancement matrix MFp1 contains the correlation between elements, and the semantic segmentation apparatus can use MFp1 as the first enhancement feature matrix.

In some embodiments of the present disclosure, the semantic segmentation apparatus may perform a second fusion process on the self-enhanced feature and the first semantic feature to obtain the first enhanced feature.

In an embodiment of the present disclosure, the second fusion process may include: matrix addition is carried out on the self-enhancement features and the first semantic features, or weighted addition is carried out on the self-enhancement features and the first semantic features; and the weighted value can be set according to needs, and the embodiment of the disclosure is not limited thereto.

For example, based on fig. 5a and with reference to fig. 5b, after obtaining the H × W × C self-enhancement matrix MFp1, the semantic segmentation apparatus weights the self-enhancement matrix according to the weight γ, and adds the weighted self-enhancement matrix and the first semantic feature matrix MF according to the elements to obtain a first enhancement feature matrix MFp 2.

S401, enhancing the second semantic features to obtain second enhanced features.

In the embodiment of the present disclosure, a manner and a principle of enhancement processing performed on the second semantic feature by the semantic segmentation apparatus are the same as those of enhancement processing performed on the first semantic feature in S401, and details are described in S401 and are not described herein again.

In some embodiments of the present disclosure, training the semantic segmentation model to be trained based on at least the first intermediate feature and the second intermediate feature in S03, and determining the implementation of the semantic segmentation model may include: S501-S503.

S501, performing loss calculation based on a preset first loss function, the first contour feature and the second contour feature to determine a first loss;

in the embodiment of the present disclosure, the preset first loss function may be a mean square error mean function, and the semantic segmentation device may calculate a first mean square error for the first contour feature and the second contour feature, and take the first mean square error as the first loss; the difference between the first profile feature and the second profile feature is characterized by a first loss. Reference formula (2)

Wherein L is_te(S) represents a first loss; f_i ^te；TRepresenting the profile feature of the ith element in the first profile feature, F_i ^te ^；SAnd representing the profile feature corresponding to the ith pixel in the second profile feature, wherein i ∈ R ═ H × W.

In the embodiment of the present disclosure, the semantic segmentation apparatus may calculate a variance for a contour feature corresponding to an ith pixel in the first contour feature and a contour feature corresponding to an ith pixel in the second contour feature, to obtain R first variances; after summing the R first variance sums to obtain a first variance sum, dividing the first variance sum by the total number of pixels to obtain a first variance mean.

S502, determining a second loss based on a preset second loss function, the first enhancement feature and the second enhancement feature;

in the embodiment of the present disclosure, the preset second loss function may be a mean square error function, and the semantic segmentation device may calculate a second mean square error for the first enhancement feature and the second enhancement feature, and take the second mean square error as the second loss; the difference between the first enhancement feature and the second enhancement feature is characterized by a second loss. Refer to equation (3).

Wherein L is_se(S) represents a first loss; f_i ^se；TRepresenting the semantic feature of the ith element in the first semantic feature, F_i ^se ^；SAnd representing the semantic feature of the ith element in the second semantic feature, i ∈ R ═ H × W.

In the embodiment of the present disclosure, the semantic segmentation apparatus may calculate a variance for a semantic feature corresponding to an ith pixel in the first semantic feature and a semantic feature corresponding to an ith pixel in the second semantic feature, to obtain R second variances; after summing the R second variances to obtain a second variance sum, dividing the second variance sum by the total number of pixels to obtain a second variance mean.

S503, training the semantic segmentation model to be trained based on at least one of the first loss and the second loss, and determining the semantic segmentation model.

In this disclosure, after determining the first loss and the second loss, the semantic segmentation apparatus may train the semantic segmentation model to be trained according to at least one of the first loss and the second loss, and determine the semantic segmentation model.

In some embodiments of the disclosure, the semantic segmentation apparatus may train the semantic segmentation model to be trained according to the first loss, and determine the semantic segmentation model.

In this embodiment of the disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the first loss is smaller than the first loss threshold.

In some embodiments of the disclosure, the semantic segmentation apparatus may train the semantic segmentation model to be trained according to the second loss, and determine the semantic segmentation model.

In this embodiment of the disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the second loss is smaller than the second loss threshold.

In some embodiments of the disclosure, the semantic segmentation apparatus may train the semantic segmentation model to be trained according to the first loss and the second loss to determine the semantic segmentation model.

In the embodiment of the present disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the first loss is less than the first loss threshold and the second loss is less than the second loss threshold; and the first loss and the second loss can be subjected to weighted summation to obtain a first semantic loss, and the semantic segmentation model is determined under the condition that the first semantic loss is smaller than a first semantic loss threshold value.

Referring to fig. 6, a is two images to be processed, and b is a feature map obtained after feature extraction is performed on the two images to be processed by the semantic segmentation model without learning texture knowledge; c, extracting the features of the two images to be processed by the semantic segmentation model which learns the texture knowledge to obtain a feature map. As can be seen from FIG. 6, after the semantic segmentation model learns the texture knowledge, the feature map contains rich texture knowledge, and the contour is clearer.

In some embodiments of the present disclosure, the training the semantic segmentation model to be trained based on at least the first intermediate feature and the second intermediate feature in S03, and determining the implementation of the semantic segmentation model, may further include: S601-S603.

S601, respectively performing semantic segmentation prediction on the first enhancement features and the second enhancement features to obtain first semantic segmentation features and second semantic segmentation features;

in the embodiment of the disclosure, the reference semantic model and the semantic segmentation model to be trained comprise a pooling layer, and the pooling layer is arranged behind the last convolution layer; after the first enhancement feature is obtained by referring to the semantic model, semantic segmentation prediction can be carried out on the first enhancement feature through the pooling layer to obtain a first semantic segmentation feature; after the semantic segmentation model to be trained obtains the second enhancement feature, semantic segmentation prediction can be performed on the second enhancement feature to obtain a second semantic segmentation feature.

S602, performing loss calculation based on a preset third loss function, the first semantic segmentation feature and the second semantic segmentation feature to determine a third loss;

in the embodiment of the present disclosure, the preset third loss function may be a mean square error function, and the semantic segmentation device may calculate a third mean difference value for the first semantic segmentation feature and the second semantic segmentation feature, and use the third mean difference value as a third loss; characterizing a difference of the first semantic segmentation feature and the second semantic segmentation feature by a third loss.

For example, reference may be made to equation (4).

Wherein L is_see(S) represents a third loss; f_i ^see；TRepresenting a semantic segmentation feature corresponding to the ith pixel in the first semantic segmentation feature, F_i ^see；SAnd the second semantic segmentation feature represents the semantic segmentation feature corresponding to the ith pixel, and i ∈ R is H multiplied by W.

In the embodiment of the present disclosure, the semantic segmentation apparatus may calculate a variance for a semantic feature of an ith element in the first semantic segmentation feature and a semantic segmentation feature corresponding to an ith pixel in the second semantic segmentation feature, to obtain R third variances; and after summing the R third square differences to obtain a third square difference sum, dividing the third square difference sum by the total number of pixels to obtain a third square difference mean value.

S603, training the semantic segmentation model to be trained based on the third loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss and the second loss and the third loss to determine the semantic segmentation model.

In some embodiments of the disclosure, after determining the third loss, the semantic segmentation apparatus may train the semantic segmentation model to be trained according to the third loss to determine the semantic segmentation model.

In this embodiment of the disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the third loss is smaller than the third loss threshold.

In this disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the first loss is less than the first loss threshold and the third loss is less than the third loss threshold; and weighting and summing the first loss and the third loss to obtain a second semantic loss, and determining the semantic segmentation model under the condition that the second semantic loss is smaller than a second semantic loss threshold value.

In this disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the second loss is less than the first loss threshold and the third loss is less than the third loss threshold; and weighting and summing the second loss and the third loss to obtain a third semantic loss, and determining the semantic segmentation model under the condition that the third semantic loss is smaller than a third semantic loss threshold value.

In some embodiments of the present disclosure, the first texture feature comprises: at least one first sub-textural feature; the second texture feature includes: at least one second sub-texture feature; in S601, performing semantic segmentation prediction on the first enhancement feature and the second enhancement feature respectively to obtain the first semantic segmentation feature and the second semantic segmentation feature, and the implementation may further include: S701-S704.

S701, determining a first graph inference relation based on the first semantic segmentation feature, the first semantic feature and the at least one first sub-texture feature.

In the embodiment of the present disclosure, after the semantic segmentation device obtains the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature, the first graph inference may be performed based on the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature, so as to obtain the first graph inference relationship.

In some embodiments of the present disclosure, determining the first graph inference relationship based on the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature comprises: determining at least two difference features among the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature based on the output order; performing correlation processing on at least two difference characteristics to obtain the correlation degree between the difference characteristics; and forming a first graph inference relationship based on the at least two difference features and the correlation degree between the difference features.

In the embodiment of the present disclosure, the at least one first sub-texture feature corresponds to at least one intermediate feature obtained by referring to at least one low-level convolution layer in the semantic model; the semantic segmentation device may determine a feature change condition between two adjacent layers according to a sequence from back to front of the convolutional layer and the pooling layer (i.e., an output sequence of features, from back to front) of the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature, so as to obtain a plurality of first relationship features (i.e., at least two difference features).

In the embodiment of the present disclosure, after determining the plurality of first relationship features, the semantic segmentation apparatus may take the plurality of first relationship features as the plurality of first nodes, and connect the plurality of first nodes according to the correlation (i.e., the degree of correlation) between the plurality of first relationship features to construct the first relationship graph G^TRefer to equation (5); and characterizing the first graph inference relationship through the first relationship graph.

G^T＝(ν^T,ε^T)＝(F_i ^va,T,A_ij ^T) Formula (5)

Wherein G is^TRepresenting a first relationship graph; v is^TRepresenting nodes, ε, in a first relational graph^TRepresenting a connecting edge in the first relation graph assembly; f_i ^va,TRepresenting the ith of the N first relational features, A_ij ^TIs represented by F_i ^va,TAnd F_j ^va,TThe connecting edge between the two plates; n represents the number of first relational features; i, j ∈ [1, N-1 ]]And, i ≠ j.

In the disclosed embodiments, F_i ^va,TCan be obtained by referring to the i +1 layer characteristics in the semantic model

And ith layer characteristics F_i ^TRefer to equation (6).

In some embodiments of the disclosure, the first graph inference relationship is constructed based on at least two difference features and a correlation between the difference features, where each feature edge can be connected, exemplarily, a_ijCan be obtained by the formula (7-1):

wherein f is_siRepresenting the similarity between the vectors.

In some embodiments of the present disclosure, in a case where there is a correlation feature between target differences equal to or less than a preset correlation threshold in the degree of correlation between difference features, the first graph inference is composed based on the correlation feature between the target differences and at least two difference featuresAnd (4) relationship. That is, in the case where the condition is satisfied, only the edges of the partial feature can be connected, for example, a_ijCan be obtained by the formula (7-2):

wherein f is_siWhich represents the degree of similarity between the vectors,

to indicate a function, μ is a similarity threshold.

As can be seen from the formula (7), the semantic segmentation device can perform edge connection between nodes with high similarity; when μ is 0, any two nodes may be connected to each other.

S702, determining a second graph inference relation based on the second semantic segmentation feature, the second semantic feature and the at least one second sub-texture feature.

In the embodiment of the present disclosure, after the semantic segmentation device obtains the second semantic segmentation feature, the second semantic feature, and the at least one second sub-texture feature, the second graph inference may be performed based on the second semantic segmentation feature, the second semantic feature, and the at least one second sub-texture feature, so as to obtain the second graph inference relationship.

In the embodiment of the present disclosure, at least one second sub-texture feature corresponds to at least one intermediate feature obtained by at least one low-level convolution layer in the semantic segmentation model to be trained; the semantic segmentation device can determine the feature change condition between two adjacent layers according to the sequence of the convolution layer and the pooling layer from back to front of the second semantic segmentation feature, the second semantic feature and at least one second sub-texture feature to obtain a plurality of second relational features.

In the embodiment of the present disclosure, after determining the plurality of second relationship features, the semantic segmentation apparatus may perform edge connection according to an edge connection manner in the first relationship graph by using the plurality of second relationship features as the plurality of second nodes to construct the second relationship graph G^SRefer to equation (5);and characterizing the second graph inference relationship through the second relationship graph.

S703, performing loss calculation based on a preset fourth loss function, the first graph reasoning relation and the second graph reasoning relation, and determining a fourth loss.

In the embodiment of the disclosure, the first graph inference relationship comprises nodes and connecting edges; the second graph reasoning relationship also comprises nodes and connecting edges; a fourth loss function is preset to characterize the vector distance between the first relational graph and the second relational graph as a fourth loss, and the difference between the first relational graph and the second relational graph is characterized by the fourth loss, see formula (8).

L_va(S)＝Dist(G^T,G^S) Formula (8)

Wherein L is_va(S) represents a fourth loss, G^TShowing a first relationship graph, G^SA second relationship diagram is shown. Dist denotes the vector distance.

In the disclosed embodiment, the first relation graph G^TIncluding a first node v^TAnd a first connecting edge ε^T(ii) a Second relation graph G^SIncluding a second node v^SAnd a second connecting edge epsilon^S(ii) a Thus, the semantic segmentation device can determine the first node v first^TAnd a second node v^SNode vector distance between, and first connecting edge ε^TAnd a second connecting edge epsilon^SAnd then the node vector distance and the edge vector distance are weighted and summed to obtain a fourth loss, which is referred to in a formula (9).

L_va(S)＝Dist(ν^T,ν^S)+λDist(ε^T,ε^S) Formula (9)

Wherein λ is a weighted weight, which can be set as required, and the embodiment of the present disclosure is not limited.

In the disclosed embodiment, equation (9) may also be expressed as equation (10):

s704, training the semantic segmentation model to be trained based on the fourth loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss, the second loss and the third loss and the fourth loss, and determining the semantic segmentation model.

In the embodiment of the present disclosure, after determining the fourth loss, the semantic segmentation apparatus may train the semantic segmentation model to be trained according to the fourth loss to determine the semantic segmentation model.

In some embodiments of the present disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the fourth loss is less than the fourth loss threshold.

In some embodiments of the present disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the fourth loss is less than the fourth loss threshold and the first loss is less than the first loss threshold; or, performing weighted summation on the fourth loss and the first loss to obtain a fourth semantic loss, and stopping training of the semantic segmentation model to be trained to obtain the semantic segmentation model under the condition that the fourth semantic loss is smaller than a fourth semantic loss threshold.

In some embodiments of the present disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the fourth loss is less than the fourth loss threshold and the second loss is less than the second loss threshold; or, performing weighted summation on the fourth loss and the second loss to obtain a fifth semantic loss, and stopping training of the semantic segmentation model to be trained to obtain the semantic segmentation model under the condition that the fifth semantic loss is smaller than a fifth semantic loss threshold.

In some embodiments of the present disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the fourth loss is less than the fourth loss threshold and the third loss is less than the third loss threshold; or, performing weighted summation on the fourth loss and the third loss to obtain a sixth semantic loss, and stopping training of the semantic segmentation model to be trained to obtain the semantic segmentation model under the condition that the sixth semantic loss is smaller than a sixth semantic loss threshold.

In some embodiments of the present disclosure, the semantic segmentation apparatus may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the fourth loss is less than a fourth loss threshold, the third loss is less than a third loss threshold, and the second loss is less than a second loss threshold; or, performing weighted summation on the fourth loss, the third loss, the second loss and the first loss to obtain a seventh semantic loss, and stopping training of the semantic segmentation model to be trained to obtain the semantic segmentation model under the condition that the seventh semantic loss is smaller than a seventh semantic loss threshold.

In some embodiments of the present disclosure, the semantic segmentation apparatus may determine a response loss based on the first semantic segmentation result and the second semantic segmentation result; and training the semantic segmentation model to be trained according to the first loss, the second loss, the third loss, the fourth loss and the response loss to obtain the semantic segmentation model.

In the disclosed embodiment, the response loss L_r(S) can be obtained according to the formula (11):

wherein, F_i ^r；TThe feature corresponding to the ith pixel in the first semantic segmentation result is obtained; f_i ^r；SAnd the feature corresponding to the ith pixel in the second semantic segmentation result.

In some embodiments of the present disclosure, the semantic segmentation apparatus may determine a training loss based on the second semantic segmentation result and the image sample; and training the semantic segmentation model to be trained according to the first loss, the second loss, the third loss, the fourth loss, the response loss and the training loss to obtain the semantic segmentation model.

In the disclosed embodiment, the response loss L_sa(S) can be obtained according to the formula (12)To:

wherein, F_i ^saThe feature corresponding to the ith pixel in the image sample.

Exemplarily, referring to fig. 7, an embodiment of the present disclosure provides a schematic diagram of knowledge distillation, as shown in fig. 7, each of the teacher network and the student network includes 4 convolutional layers and one Pooling layer, and the Pooling layer is implemented by a Pyramid Pooling Module (PPM). Wherein, the first 3 convolutional layers are low-layer convolutional layers, the 4 th convolutional layer is a high-layer convolutional layer, and the high-layer convolutional layer is connected with one SAM; the teacher network sequentially extracts 3 sub-texture features, 1 first semantic feature and 1 first semantic segmentation feature through 4 convolutional layers and 1 pooling layer; carrying out contour decomposition on 3 texture features by 3 CDM (code division multiplexing) to obtain 3 first sub-contour features, carrying out enhancement processing on the first semantic features by SAM (sample access memory) on the basis of an attention mechanism to obtain 1 first enhancement feature, and obtaining a first semantic segmentation result on the basis of the first semantic segmentation feature; similarly, the student network can obtain 3 second sub-texture features, 3 second sub-contour features, 1 second semantic feature, 1 second enhancement feature, 1 second semantic segmentation feature and a second semantic segmentation result through 4 convolutional layers, 3 CDM and 1 SAM; thus, the student network can learn the texture knowledge of the teacher network based on the 3 first sub-outline features and the 3 second self-outline features, and learn the semantic knowledge of the teacher network based on the 1 first enhancement feature, the 1 second enhancement feature, the 1 first semantic segmentation feature and the 1 second semantic segmentation feature; texture knowledge and semantic knowledge are used as feature knowledge; learning the relation knowledge of the teacher network based on the first relation features among the 3 first sub-texture features, the 1 first semantic features and the 1 first semantic segmentation features and the second relation features among the 3 second sub-texture features, the 1 second semantic features and the 1 second semantic segmentation features; and learning responsive knowledge of the teacher network based on the first semantic segmentation result and the second semantic segmentation result. Therefore, the student network can learn rich knowledge from the teacher network, and the semantic segmentation precision of the student network is improved.

Referring to fig. 8, fig. 8 is a schematic diagram illustrating semantic segmentation results of a student network, as shown in fig. 8, a is an original image in an urban scene, b is a semantic segmentation result of a student network in the related art, c is a semantic segmentation result of a student network in the present case, and d is an image sample for semantic segmentation of an original image in a; therefore, the semantic segmentation result of the student network of the scheme contains richer information and is closer to an image sample.

For example, the knowledge distillation method of fig. 7 is applied to an urban setting, and table 1 shows the comparison result of the average intersection ratio of the student network and the teacher network in the urban setting. As shown in table 1, the average cross-over ratio of the student network itself is the lowest, the average cross-over ratio is improved after the Structural Knowledge Distillation (SKD) is adopted, the average cross-over ratio is further improved after the Intra-class Feature Variation Distillation (IFKD) is adopted, and the average cross-over ratio is the highest by the method of the present invention.

Taking the student network as ResNet18 as an example, ResNet18 adopts the average intersection ratio of val test set as 69.1, which is 9.46% different from the teacher network, and adopts the average intersection ratio of test set as 67.6, which is 9.18% different from the teacher network; the method adopts the average intersection ratio of a val test set of 75.82, which is 6.72% higher than ResNet18, adopts the average intersection ratio of a test set of 73.78, which is 6.18% higher than ResNet18, and is closest to the average intersection ratio of a teacher network; wherein, val is a test set used in the training process and is used for judging the learning state in time according to the training result. test is a test set used to evaluate the model results after the training of the model is completed. As can be seen from Table 1, the accuracy of the student network trained by the method is remarkably improved, and the accuracy is closest to that of a teacher network.

Fig. 9 is a schematic diagram of an optional constituent structure of the semantic segmentation apparatus provided in the embodiment of the present disclosure, and as shown in fig. 9, the semantic segmentation apparatus 20 includes:

a feature obtaining module 2000, configured to obtain an image to be processed;

the semantic segmentation module 2004 is configured to perform semantic segmentation processing on the image to be processed by using the semantic segmentation model to obtain a semantic segmentation result of the image to be processed; the semantic segmentation model is obtained by taking a first transformation feature which is subjected to contour decomposition or enhancement processing by referring to a first intermediate feature output by the semantic model as a reference and carrying out contour decomposition or enhancement processing on a second transformation feature training by combining a second intermediate feature output by the semantic segmentation model to be trained;

a first texture feature and a second texture feature;

a first semantic feature and a second semantic feature;

a first profile feature and a second profile feature;

a first enhancement feature and a second enhancement feature.

In some embodiments, the feature extraction module 2001 is configured to perform feature extraction on the image sample by using the reference semantic model and the to-be-trained semantic segmentation model respectively to obtain the first intermediate feature and the second intermediate feature; the reference semantic model is a pre-trained semantic segmentation network; the semantic segmentation model to be trained is a network with the same function as the reference semantic model;

a feature processing module 2002, configured to perform contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature respectively to obtain a first transformation feature and a second transformation feature;

a training module 2003, configured to train the semantic segmentation model to be trained based on at least the first transformation feature and the second transformation feature, and determine the semantic segmentation model.

In some embodiments, the feature extraction module 2001 is further configured to perform feature extraction on the image sample by using the reference semantic model and the to-be-trained semantic segmentation model respectively to obtain a first texture feature and a second texture feature; and performing feature extraction on the first texture feature and the second texture feature to obtain a first semantic feature and a second semantic feature.

In some embodiments, the feature processing module 2002 is further configured to at least one of:

In some embodiments, the training module 2003 is further configured to perform a loss calculation based on a preset first loss function, the first contour feature and the second contour feature, and determine a first loss; determining a second loss based on a preset second loss function, the first enhancement feature and the second enhancement feature; training the semantic segmentation model to be trained based on at least one of the first loss and the second loss, and determining the semantic segmentation model.

In some embodiments, the training module 2003 is further configured to perform semantic segmentation prediction on the first enhanced feature and the second enhanced feature respectively to obtain a first semantic segmentation feature and a second semantic segmentation feature; performing loss calculation based on a preset third loss function, the first semantic segmentation feature and the second semantic segmentation feature to determine a third loss; training the semantic segmentation model to be trained based on the third loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss and the second loss and the third loss, and determining the semantic segmentation model.

In some embodiments, the first texture feature comprises: at least one first sub-textural feature; the second texture feature comprises: at least one second sub-texture feature; the training module 2003 is further configured to perform semantic segmentation prediction on the first enhancement feature and the second enhancement feature respectively to obtain a first semantic segmentation feature and a second semantic segmentation feature, and then determine a first graph inference relationship based on the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature; determining a second graph inference relationship based on the second semantic segmentation feature, the second semantic feature, and the at least one second sub-texture feature; performing loss calculation based on a preset fourth loss function, the first graph reasoning relation and the second graph reasoning relation to determine a fourth loss; training the semantic segmentation model to be trained based on the fourth loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss, the second loss and the third loss and the fourth loss, and determining the semantic segmentation model.

In some embodiments, the training module 2003 is further configured to determine at least two difference features between the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature based on an output order; performing correlation processing on the at least two difference characteristics to obtain the correlation degree between the difference characteristics; and constructing the first graph inference relationship based on the at least two difference features and the correlation degree between the difference features.

In some embodiments, the training module 2003 is further configured to, in the case that there is a correlation feature between target differences smaller than or equal to a preset correlation threshold in the correlation degree between the difference features, construct the first graph inference relationship based on the correlation feature between the target differences and the at least two difference features.

In some embodiments, the feature processing module 2002 is further configured to filter the first texture feature and the second texture feature based on an interlace factor to obtain a high-pass sub-band and a low-pass sub-band, respectively; carrying out directional filtering on the high-pass sub-band to obtain a directional sub-band; and respectively fusing the low-pass sub-band and the direction sub-band corresponding to the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature, thereby completing contour decomposition processing.

In some embodiments, the feature processing module 2002 is further configured to perform at least two transformations on the first semantic feature and the second semantic feature to obtain at least two semantic transformation features corresponding to each other; performing self-enhancement processing on different semantic transformation characteristics in the at least two semantic transformation characteristics to obtain a correlation matrix; performing enhancement processing on the correlation matrix and one of the at least two semantic conversion characteristics to obtain a self-enhancement characteristic; determining respective self-enhanced features as the first enhanced features and the second enhanced features based on respective self-enhanced matrices of the first semantic features and the second semantic features; or fusing the first semantic feature and the second semantic feature with respective self-enhancement features respectively to obtain the first enhancement feature and the second enhancement feature.

Fig. 10 is a schematic diagram of an optional constituent structure of the electronic device provided in the embodiment of the present disclosure, and as shown in fig. 10, the electronic device 21 includes: a processor 2101 and a memory 2102, wherein the memory 2102 stores a computer program operable on the processor 2101, and the processor 2101 executes the computer program to perform the steps of any one of the semantic segmentation methods according to the embodiments of the present disclosure; the processor 2101 and memory 2102 are connected by a communication bus 2103.

The Memory 2102 is configured to store computer programs and applications to be processed by the processor 2101, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 2101 and various modules in the electronic device, which may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).

The processor 2101, when executing a program, performs the steps of any of the semantic segmentation methods described above. The processor 2101 generally controls the overall operation of the electronic device 21.

The Processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic device implementing the above processor function may be other, and the embodiments of the present disclosure are not limited.

The computer-readable storage medium/Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM), and the like; but may also be various terminals such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present disclosure, reference is made to the description of the embodiments of the method of the present disclosure.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure. The above-mentioned serial numbers of the embodiments of the present disclosure are merely for description and do not represent the merits of the embodiments.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present disclosure.

In addition, all the functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Alternatively, the integrated unit of the present disclosure may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an automatic test line of a device to perform all or part of the methods according to the embodiments of the present disclosure. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The methods disclosed in the several method embodiments provided in this disclosure may be combined arbitrarily without conflict to arrive at new method embodiments.

The features disclosed in the several method or apparatus embodiments provided in this disclosure may be combined in any combination to arrive at a new method or apparatus embodiment without conflict.

The above description is only an embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of semantic segmentation, comprising:

acquiring an image to be processed;

the first intermediate feature and the second intermediate feature comprise at least one of the following: a first texture feature and a second texture feature; a first semantic feature and a second semantic feature;

the first transformation feature and the second transformation feature comprise at least one of the following: a first profile feature and a second profile feature; a first enhancement feature and a second enhancement feature.

2. The method of claim 1, wherein the reference semantic model is a pre-trained semantic segmentation network; the semantic segmentation model to be trained is a network with the same function as the reference semantic model; the method further comprises the following steps:

and training the semantic segmentation model to be trained at least based on the first transformation characteristic and the second transformation characteristic to determine the semantic segmentation model.

3. The method according to claim 2, wherein the performing feature extraction on the image sample by using the reference semantic model and the to-be-trained semantic segmentation model respectively to obtain the first intermediate feature and the second intermediate feature comprises:

respectively extracting the features of the image sample by adopting the reference semantic model and the semantic segmentation model to be trained to obtain a first texture feature and a second texture feature;

and performing feature extraction on the first texture feature and the second texture feature to obtain a first semantic feature and a second semantic feature.

4. The method according to claim 2 or 3, wherein the performing contour decomposition or enhancement on the first intermediate feature and the second intermediate feature to obtain the first transformed feature and the second transformed feature respectively comprises at least one of:

5. The method according to any one of claims 2 to 4, wherein the training the semantic segmentation model to be trained based on at least the first transformation feature and the second transformation feature to determine the semantic segmentation model comprises:

performing loss calculation based on a preset first loss function, the first contour feature and the second contour feature to determine a first loss;

determining a second loss based on a preset second loss function, the first enhancement feature and the second enhancement feature;

training the semantic segmentation model to be trained based on at least one of the first loss and the second loss, and determining the semantic segmentation model.

6. The method according to any one of claims 2 to 5, wherein the training the semantic segmentation model to be trained based on at least the first transformation feature and the second transformation feature to determine the semantic segmentation model comprises:

respectively performing semantic segmentation prediction on the first enhancement features and the second enhancement features to obtain first semantic segmentation features and second semantic segmentation features;

performing loss calculation based on a preset third loss function, the first semantic segmentation feature and the second semantic segmentation feature to determine a third loss;

training the semantic segmentation model to be trained based on the third loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss and the second loss and the third loss, and determining the semantic segmentation model.

7. The method of claim 6, wherein the first texture feature comprises: at least one first sub-textural feature; the second texture feature comprises: at least one second sub-texture feature;

after the semantic segmentation prediction is performed on the first enhancement feature and the second enhancement feature respectively to obtain a first semantic segmentation feature and a second semantic segmentation feature, the method further includes:

determining a first graph inference relationship based on the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature;

determining a second graph inference relationship based on the second semantic segmentation feature, the second semantic feature, and the at least one second sub-texture feature;

performing loss calculation based on a preset fourth loss function, the first graph reasoning relation and the second graph reasoning relation to determine a fourth loss;

training the semantic segmentation model to be trained based on the fourth loss to determine the semantic segmentation model; or training the semantic segmentation model to be trained based on at least one of the first loss, the second loss and the third loss and the fourth loss, and determining the semantic segmentation model.

8. The method according to claim 7, wherein determining a first graph inference relationship based on the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature comprises:

9. The method of claim 8, wherein constructing the first graph inference relationship based on the at least two difference features and a degree of correlation between the difference features comprises:

10. The method of claim 4, wherein the performing contour decomposition on the first texture feature and the second texture feature to obtain a first contour feature and a second contour feature comprises:

11. The method of claim 4, wherein the enhancing the first semantic features and the second semantic features to obtain first enhanced features and second enhanced features comprises:

12. A semantic segmentation apparatus, comprising:

wherein the first intermediate feature and the second intermediate feature comprise at least one of the following: a first texture feature and a second texture feature; a first semantic feature and a second semantic feature; the first transformation feature and the second transformation feature comprise at least one of the following: a first profile feature and a second profile feature; a first enhancement feature and a second enhancement feature.

13. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method of any one of claims 1 to 11 when executing the computer program stored in the memory.

14. A computer-readable storage medium, characterized in that a computer program is stored for implementing the method of any of claims 1 to 11 when being executed by a processor.