WO2023273026A1 - 语义分割方法、装置、电子设备和计算机可读存储介质 - Google Patents

语义分割方法、装置、电子设备和计算机可读存储介质 Download PDF

Info

Publication number
WO2023273026A1
WO2023273026A1 PCT/CN2021/125073 CN2021125073W WO2023273026A1 WO 2023273026 A1 WO2023273026 A1 WO 2023273026A1 CN 2021125073 W CN2021125073 W CN 2021125073W WO 2023273026 A1 WO2023273026 A1 WO 2023273026A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
semantic
semantic segmentation
loss
features
Prior art date
Application number
PCT/CN2021/125073
Other languages
English (en)
French (fr)
Inventor
纪德益
王浩然
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023273026A1 publication Critical patent/WO2023273026A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to image processing technology, and in particular to a semantic segmentation method, device, electronic equipment and computer-readable storage medium.
  • semantic segmentation technology knowledge distillation is introduced into semantic segmentation technology; knowledge distillation can transfer the knowledge learned from complex models to simple models, so that in practical applications, simple models can be conveniently used for semantic segmentation; however, In the process of knowledge transfer, the result of semantic segmentation of the complex model is usually used as the response-based knowledge to guide the learning of the simple model. In this way, the knowledge transferred to the simple model is not rich enough, resulting in the failure of the semantic segmentation of the simple model after learning. Low precision.
  • Embodiments of the present disclosure provide a semantic segmentation method, device, electronic device, and computer-readable storage medium, which improve the accuracy of semantic segmentation.
  • An embodiment of the present disclosure provides a semantic segmentation method, including:
  • the semantic segmentation model is based on the first intermediate feature output by the reference semantic model for contour decomposition or enhancement processing as a reference, combined with the second intermediate feature output by the semantic segmentation model to be trained for contour decomposition or enhancement processing obtained by the second transformation feature training;
  • the first intermediate feature and the second intermediate feature comprise at least one of the following:
  • the first transform feature and the second transform feature comprise at least one of the following:
  • a first enhancement feature and a second enhancement feature are provided.
  • the reference semantic model is a pre-trained semantic segmentation network;
  • the semantic segmentation model to be trained is a network with the same function as the reference semantic model; the method also includes:
  • the semantic segmentation model to be trained is trained based on at least the first transformation feature and the second transformation feature, and the semantic segmentation model is determined.
  • said using said reference semantic model and said semantic segmentation model to be trained to perform feature extraction on image samples respectively, to obtain said first intermediate feature and said second intermediate feature comprising: using said reference The semantic model and the semantic segmentation model to be trained respectively perform feature extraction on the image sample to obtain the first texture feature and the second texture feature; perform feature extraction on the first texture feature and the second texture feature to obtain the first semantic feature features and second semantic features.
  • the contour decomposition or enhancement processing is performed on the first intermediate feature and the second intermediate feature to obtain the first transformation feature and the second transformation feature, including at least one of the following:
  • the training of the semantic segmentation model to be trained based on at least the first transformation feature and the second transformation feature, and determining the semantic segmentation model includes: based on the preset first loss Function, the first contour feature and the second contour feature perform loss calculation to determine the first loss; based on the preset second loss function, the first enhanced feature and the second enhanced feature, determine the second loss ; based on at least one of the first loss and the second loss, the semantic segmentation model to be trained is trained to determine the semantic segmentation model.
  • the training of the semantic segmentation model to be trained based on at least the first transformation feature and the second transformation feature, and determining the semantic segmentation model includes: enhancing the first feature and the second enhanced feature perform semantic segmentation prediction respectively to obtain the first semantic segmentation feature and the second semantic segmentation feature; based on the preset third loss function, the first semantic segmentation feature and the second semantic segmentation feature Perform loss calculation to determine a third loss; train the semantic segmentation model to be trained based on the third loss to determine the semantic segmentation model; or, based on at least one of the first loss and the second loss, And the third loss is to train the semantic segmentation model to be trained to determine the semantic segmentation model.
  • the first texture feature includes: at least one first sub-texture feature; the second texture feature includes: at least one second sub-texture feature; the pair of the first enhancement feature and the second
  • the method further includes: based on the first semantic segmentation feature, the first semantic feature, the at least one first sub-texture feature, determine the first graph reasoning relationship; based on the second semantic segmentation feature, the second semantic feature, and the at least one second sub-texture feature, determine the second graph reasoning relationship; based on the preset fourth loss function, the performing loss calculation on the inference relationship in the first graph and the inference relationship in the second graph to determine a fourth loss; training the semantic segmentation model to be trained based on the fourth loss to determine the semantic segmentation model; or, The semantic segmentation model to be trained is trained based on at least one of the first loss, the second loss, the third loss, and the fourth loss, and the semantic segmentation model is determined
  • determining the first graph reasoning relationship based on the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature includes:
  • the first graph reasoning relationship is formed based on the at least two difference features and the correlation between the difference features.
  • the forming of the first graph reasoning relationship based on the at least two difference features and the correlation between the difference features includes:
  • the Describe the reasoning relationship in the first graph In the case that there is an association feature between target differences that is less than or equal to a preset association threshold in the correlation between the difference features, based on the association feature between the target differences and the at least two difference features, the Describe the reasoning relationship in the first graph.
  • performing contour decomposition processing on the first texture feature and the second texture feature respectively to obtain the first contour feature and the second contour feature including:
  • the enhancement processing of the first semantic feature and the second semantic feature to obtain the first enhanced feature and the second enhanced feature includes:
  • the respective self-enhancement features as the first enhancement feature and the second enhancement feature; or, the first The semantic features and the second semantic features are respectively fused with their respective self-enhancement features to obtain the first enhancement features and the second enhancement features.
  • An embodiment of the present disclosure provides a semantic segmentation device, including:
  • the feature acquisition part is configured to acquire the image to be processed
  • the semantic segmentation part is configured to use the semantic segmentation model to perform semantic segmentation processing on the image to be processed to obtain the semantic segmentation result of the image to be processed;
  • the semantic segmentation model is the first output of the reference semantic model
  • the first transformation feature of the intermediate feature for contour decomposition or enhancement processing is used as a reference, combined with the second intermediate feature output by the semantic segmentation model to be trained to perform contour decomposition or enhancement processing for the second transformation feature training;
  • the first intermediate feature and the second intermediate feature comprise at least one of the following:
  • the first transform feature and the second transform feature comprise at least one of the following:
  • a first enhancement feature and a second enhancement feature are provided.
  • the semantic segmentation device also includes:
  • the feature extraction part is configured to use the reference semantic model and the semantic segmentation model to be trained to perform feature extraction on image samples, respectively, to obtain the first intermediate feature and the second intermediate feature;
  • the reference semantic model It is a pre-trained semantic segmentation network;
  • the semantic segmentation model to be trained is a network consistent with the function of the reference semantic model;
  • the feature processing part is configured to perform contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature, respectively, to obtain the first transformed feature and the second transformed feature;
  • the training part is configured to train the semantic segmentation model to be trained based on at least the first transformation feature and the second transformation feature, and determine the semantic segmentation model.
  • the feature extraction part is further configured to use the reference semantic model and the semantic segmentation model to be trained to perform feature extraction on the image sample, respectively, to obtain the first texture feature and the second texture feature; Feature extraction is performed on the first texture feature and the second texture feature to obtain the first semantic feature and the second semantic feature.
  • the feature processing part is further configured to perform contour decomposition processing on the first texture feature and the second texture feature to obtain the first contour feature and the second contour feature; or, The first semantic feature and the second semantic feature are enhanced to obtain a first enhanced feature and a second enhanced feature.
  • the feature processing part is further configured to perform contour decomposition processing on the first texture feature and the second texture feature respectively to obtain the first contour feature and the second contour feature; and Perform enhancement processing on the first semantic feature and the second semantic feature to obtain the first enhanced feature and the second enhanced feature.
  • the training part is further configured to perform loss calculation based on a preset first loss function, the first profile feature, and the second profile feature to determine the first loss; based on the preset second a loss function, the first enhanced feature, and the second enhanced feature, determining a second loss; training the semantic segmentation model to be trained based on at least one of the first loss and the second loss, The semantic segmentation model is determined.
  • the training part is further configured to perform semantic segmentation prediction on the first enhanced feature and the second enhanced feature to obtain the first semantic segmentation feature and the second semantic segmentation feature; based on the prediction Set the third loss function, the first semantic segmentation feature and the second semantic segmentation feature to perform loss calculation to determine the third loss; based on the third loss, the semantic segmentation model to be trained is trained to determine The semantic segmentation model; or, based on at least one of the first loss, the second loss, and the third loss, the semantic segmentation model to be trained is trained to determine the semantic segmentation model.
  • the first texture feature includes: at least one first sub-texture feature; the second texture feature includes: at least one second sub-texture feature; the training part is further configured to Semantic segmentation prediction is performed on the first enhanced feature and the second enhanced feature respectively, and after the first semantic segmentation feature and the second semantic segmentation feature are obtained, based on the first semantic segmentation feature, the first semantic feature, the at least one first semantic segmentation feature, A sub-texture feature, determining a first graph inference relationship; based on the second semantic segmentation feature, the second semantic feature, and the at least one second sub-texture feature, determining a second graph inference relationship; based on a preset fourth loss function , performing loss calculation on the first graph reasoning relationship and the second graph reasoning relationship to determine a fourth loss; based on the fourth loss, train the semantic segmentation model to be trained to determine the semantic segmentation model ; or, based on at least one of the first loss, the second loss, the third loss, and the fourth loss, the semantic segmentation model to be trained is
  • the training part is further configured to determine at least two of the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature based on the output order. Difference features; performing correlation processing on the at least two difference features to obtain the correlation between the difference features; based on the at least two difference features and the correlation between the difference features, the first graph is formed reasoning relationship.
  • the training part is further configured to, if there is an association feature between target differences that is less than or equal to a preset association threshold in the correlation between the difference features, based on the The association features among them and the at least two difference features constitute the first graph reasoning relationship.
  • the feature processing part is further configured to filter both the first texture feature and the second texture feature based on an interlacing factor to obtain respective high-pass subbands and low-pass sub-bands; performing direction filtering on the high-pass sub-bands to obtain direction sub-bands; respectively fusing the low-pass sub-bands and the direction sub-bands corresponding to the first texture feature and the second texture feature, The first contour feature and the second contour feature are obtained, and contour decomposition processing is completed.
  • the feature processing part is further configured to perform at least two transformations on the first semantic feature and the second semantic feature to obtain at least two corresponding semantic transformation features; performing self-enhancement processing on different semantic transformation features among the at least two semantic transformation features to obtain a correlation matrix; performing enhancement processing on the correlation matrix and one of the at least two semantic transformation features to obtain self-enhancement features ; Based on the self-enhancement matrices corresponding to the first semantic feature and the second semantic feature, determine the respective self-enhancement features as the first enhancement feature and the second enhancement feature; or, the first A semantic feature and the second semantic feature are respectively fused with their respective self-enhancement features to obtain the first enhancement feature and the second enhancement feature.
  • An embodiment of the present disclosure provides an electronic device, including:
  • a memory configured to store a computer program
  • a processor configured to implement the above semantic segmentation method when executing the computer program stored in the memory.
  • An embodiment of the present disclosure provides a computer-readable storage medium storing a computer program configured to implement the above semantic segmentation method when executed by a processor.
  • An embodiment of the present disclosure provides a computer program, where the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on a computer, the computer executes the above semantic segmentation method.
  • Embodiments of the present disclosure provide a semantic segmentation method, device, electronic device, and computer-readable storage medium; the semantic segmentation device can migrate knowledge based on multiple features obtained in the process of semantic segmentation from a reference semantic model to semantic segmentation In the model; the semantic segmentation model can learn richer knowledge, thereby improving the semantic segmentation accuracy when using the semantic segmentation model to perform semantic segmentation on the image to be processed.
  • Fig. 1a is a schematic flowchart of an optional semantic segmentation method provided by an embodiment of the present disclosure
  • FIG. 1b is a schematic diagram of a training process of an optional semantic segmentation model provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of an optional semantic segmentation process provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an optional contour decomposition method provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of an optional low-pass filtering method provided by an embodiment of the present disclosure.
  • Fig. 5a is a schematic diagram of an optional enhanced processing provided by an embodiment of the present disclosure.
  • Fig. 5b is a schematic diagram of an optional enhanced processing provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of an optional texture knowledge learning effect provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of a semantic segmentation method provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of an optional semantic segmentation result of a student network provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of the composition and structure of a semantic segmentation device provided by an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of the composition and structure of an electronic device provided by an embodiment of the present disclosure.
  • the term "comprises”, “comprises” or any other variation thereof is intended to cover a non-exclusive inclusion, so that a method or device comprising a series of elements not only includes the explicitly stated elements, but also include other elements not explicitly listed, or also include elements inherent in implementing the method or apparatus.
  • an element defined by the phrase “comprising a " does not exclude the presence of additional related elements (such as steps in the method or A unit in an apparatus, for example, a unit may be part of a circuit, part of a processor, part of a program or software, etc.).
  • the display method provided by the embodiment of the present disclosure includes a series of steps, but the display method provided by the embodiment of the present disclosure is not limited to the steps described.
  • the display device provided by the embodiment of the present disclosure includes a series of parts, However, the display device provided by the embodiments of the present disclosure is not limited to include the parts explicitly recorded, and may also include the parts that need to be set up for obtaining relevant information or processing based on the information.
  • Knowledge distillation transfer the knowledge learned by the complex model to the simple model, so that the accuracy of the semantic segmentation of the simple model approaches the complex model; that is, the trained complex model (such as the reference semantic model) is used as the teacher network , a simple model (such as a semantic segmentation model to be trained) is used as a student network, and the teacher network guides the student network to learn knowledge, thereby obtaining a trained simple model.
  • the complex model has a large structure and high precision, while the simple model is small in size, and there is a gap between the precision and the complex model.
  • Embodiments of the present disclosure provide a semantic segmentation method, device, electronic device, and computer-readable storage medium, which can improve the accuracy of semantic segmentation.
  • the semantic segmentation method provided by the embodiment of the present disclosure is applied to an electronic device, and an exemplary application of the electronic device provided by the embodiment of the present disclosure will be described below.
  • the electronic device provided by the embodiment of the present disclosure can be implemented as AR glasses, notebook computer, tablet computer, desktop computer, set-top box, mobile device (for example, mobile phone, portable music player, personal digital assistant, dedicated message device, portable game device)
  • Various types of user terminals, such as, can also be implemented as a server, which is not limited in this embodiment of the present disclosure.
  • FIG. 1a is a schematic flowchart of an optional semantic segmentation method provided by an embodiment of the present disclosure, which will be described in conjunction with the steps shown in FIG. 1a.
  • semantic segmentation model uses the semantic segmentation model to perform semantic segmentation processing on the image to be processed, and obtain the semantic segmentation result of the image to be processed; wherein, the semantic segmentation model is a first transformation that performs contour decomposition or enhancement processing on the first intermediate feature output by the reference semantic model
  • the feature is used as a reference, and is obtained by performing contour decomposition or enhancement processing in combination with the second intermediate feature output by the semantic segmentation model to be trained for training.
  • the first intermediate feature and the second intermediate feature include at least one group of the following: the first texture feature and the second texture feature; the first semantic feature and the second semantic feature; the first transformation feature and the second transformation feature include the following at least one group: First profile feature and second profile feature; first enhancement feature and second enhancement feature.
  • the semantic segmentation model may be used to perform semantic segmentation on the acquired image to be processed to obtain a semantic segmentation result.
  • the reference semantic model is a pre-trained semantic segmentation network; the semantic segmentation model to be trained is a network with the same function as the reference semantic model.
  • the semantic segmentation device can realize the training process through S01-S03, as follows:
  • S02. Perform contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature, respectively, to obtain the first transformed feature and the second transformed feature;
  • the reference semantic model and the semantic segmentation model to be trained have the same function, and both are used for semantic segmentation;
  • the reference semantic model is a complex model successfully trained, and the semantic segmentation model is a simple model, and the reference semantic model is used to guide semantic segmentation.
  • the segmentation model is trained, and the knowledge learned from the reference semantic model is transferred to the semantic segmentation model.
  • the semantic segmentation device uses the reference semantic model and the semantic segmentation model to be trained to perform feature extraction on image samples, and can obtain the first intermediate feature and the second Two intermediate features.
  • feature maps can be embodied by a feature map, where the feature map can be represented by a C ⁇ H ⁇ M matrix; H ⁇ M represents the pixels of the feature map, and C represents the number of channels of the feature map , that is, feature maps can be viewed as C-dimensional deep descriptors.
  • both the reference semantic model and the semantic segmentation model to be trained include multiple convolutional layers, wherein multiple corresponding intermediate features can be sequentially obtained through multiple convolutional layers; multiple intermediate features include low-level features and high-level features; among them, low-level features contain texture information and can be used as texture features; high-level features contain semantic information and can be used as semantic features.
  • the first intermediate feature may include: at least one of the first texture feature and the first semantic feature; the first texture feature is a low-level feature extracted from a reference semantic model, and the first semantic feature is a reference semantic feature High-level features extracted by the model.
  • the second intermediate feature may include: at least one of the second texture feature and the second semantic feature; the second texture feature is a low-level feature extracted by the semantic segmentation model to be trained, and the second semantic feature High-level features extracted for semantic segmentation models.
  • the second intermediate feature when the first intermediate feature includes the first texture feature, the second intermediate feature includes the second texture feature; when the first intermediate feature includes the first semantic feature, the second intermediate feature includes Second semantic feature.
  • the reference semantic model and the semantic segmentation model to be trained can be: ResNet, ENet, ESPNet, BiSeNet, SegNet, ESPNet, RefineNet, ENet, etc.
  • the reference semantic model and the semantic segmentation model can be the same
  • the model may also be a different model; this is not limited in this embodiment of the present disclosure.
  • the reference semantic model includes 4 convolutional layers and a decoder.
  • the original image 20 is semantically segmented through the reference semantic model to obtain a semantic segmentation result 23 .
  • the features extracted by the first three convolutional layers are low-level features, as shown in 21, which contain a large amount of texture information
  • the features extracted by the fourth convolutional layer are high-level features, as shown in 22, which contain semantic information.
  • the semantic segmentation device may perform contour decomposition on the first intermediate feature to obtain the first contour feature; and The second intermediate feature is subjected to contour decomposition to obtain a second contour feature; the first transformed feature includes the first contour feature, and the second transformed feature includes the second contour feature.
  • the semantic segmentation device may perform contour decomposition processing on the first texture feature and the second texture feature, and decompose the first texture feature into at least one first band-pass sub-band and a first low-pass sub-band, based on At least one first bandpass subband and a first lowpass subband to obtain a first contour feature; and decomposing the second texture feature into at least one second bandpass subband and a second lowpass subband based on at least one first Two pass sub-bands and a second low-pass sub-band to obtain a second profile feature.
  • the semantic segmentation device may perform fusion processing on at least one first bandpass subband and the first lowpass subband to obtain the first contour feature; the semantic segmentation device may perform fusion processing on at least one second bandpass subband Perform fusion processing with the second low-pass sub-band to obtain the second contour feature.
  • the semantic segmentation device may include a low-pass filter and a directional filter, and perform contour decomposition on the first texture feature and the second texture feature through the low-pass filter and the directional filter.
  • the semantic segmentation device may decompose the first texture feature and the second texture feature at least one level by means of Laplacian pyramid decomposition.
  • the semantic segmentation device may perform enhancement processing on the first intermediate feature to obtain the first enhanced feature; and, The second intermediate feature is enhanced to obtain the second enhanced feature; the first transformed feature includes the first enhanced feature, and the second transformed feature includes the second enhanced feature.
  • the semantic segmentation device may perform enhancement processing on the first semantic feature and the second semantic feature to obtain the first enhanced feature that can reflect the correlation of pixels in the first semantic feature and the second enhanced feature that can reflect the correlation between pixels in the second semantic feature.
  • a second enhancement feature of the correlation of pixels may be performed by the semantic segmentation device.
  • the semantic segmentation device can pre-train an attention model, and implement enhancement processing through the attention model;
  • the attention model can be a joint attention model, or a multi-level attention model, It may also be an intrinsic attention model, etc., which may be set as required, and is not limited in this embodiment of the present disclosure.
  • the semantic segmentation device may also determine the feature matrix of the first enhanced feature based on the feature matrix of the first semantic feature; and determine the feature matrix of the second enhanced feature based on the feature matrix of the second semantic feature .
  • the semantic segmentation device can train the semantic segmentation model to be trained based on the first intermediate feature and the second intermediate feature. After the training is successful, the semantic segmentation Model.
  • the semantic segmentation device can determine the feature loss between the first intermediate feature and the second intermediate feature; the feature loss is used to characterize the difference between the first intermediate feature and the second intermediate feature; semantic segmentation The device can train the semantic segmentation model to be trained according to the feature loss, and stop the training when the feature loss is less than the feature loss threshold to obtain the semantic segmentation model.
  • the feature loss when the first intermediate feature and the second intermediate feature include the first contour feature and the second contour feature, the feature loss includes the first loss; the first loss is used to characterize the first contour feature and the second contour feature The difference between the two contour features; in the case where the first intermediate feature and the second intermediate feature include the first semantic feature and the second semantic feature, the feature loss includes the second loss; the second loss is used to characterize the first enhanced feature and The difference between the second enhanced features.
  • the difference between features may be characterized by a vector distance; here, the vector distance may be a cosine distance or a Euclidean distance, which is not limited in this embodiment of the present disclosure.
  • the semantic segmentation device may train the semantic segmentation model to be trained based on at least one of the third loss, the fourth loss, the response loss, the training loss, and the feature loss.
  • the response loss represents the difference between the first semantic segmentation result and the second semantic segmentation result
  • the first semantic segmentation result and the second semantic segmentation result are the reference semantic model and the semantic segmentation model to be trained to perform semantic segmentation on image samples The obtained result
  • the third loss characterizes the difference between the first semantic segmentation feature and the second semantic segmentation feature
  • the first semantic segmentation feature and the second semantic segmentation feature are the pooling of the reference semantic model and the semantic segmentation model to be trained respectively
  • the feature extracted by the layer the pooling layer is the feature extraction layer after the high-level convolutional layer
  • the fourth loss is used to represent the loss between the first relational feature and the second relational feature, where the first relational feature represents the first intermediate feature In the first semantic segmentation feature, the relationship between multiple features, the second relationship feature represents the second intermediate feature and the
  • the semantic segmentation device can set the response loss threshold, the third loss threshold, the fourth loss threshold and the training loss threshold respectively; thus, the semantic segmentation device can set the third loss, the fourth loss, the response loss When at least one of the training loss and the feature loss are less than the corresponding loss threshold, the training is stopped to obtain the semantic segmentation model.
  • the semantic segmentation device may perform weighted summation on at least one of the third loss, the fourth loss, the response loss, the training loss, and the feature loss to obtain the semantic loss; when the semantic loss is less than the semantic loss In the case of the threshold, the training is stopped and the semantic segmentation model is obtained.
  • the semantic segmentation device can extract the first intermediate feature of the reference semantic model and the second intermediate feature of the semantic segmentation model; perform contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature to obtain the first transformation feature and the second transform feature, the first transform feature and the second transform feature include at least one of texture knowledge and semantic knowledge, so that the semantic segmentation device is based on the first transform feature and the second transform feature for the semantic segmentation device to be trained After training, the obtained semantic segmentation model can learn the texture knowledge and semantic knowledge in the reference semantic model, thereby improving the precision when using the semantic segmentation model to perform semantic segmentation on the image to be processed.
  • the reference semantic model and the semantic segmentation model to be trained are used to extract the features of the image sample respectively, and the realization of the first intermediate feature and the second intermediate feature can be obtained, which may include:
  • S202 Perform feature extraction on the first texture feature and the second texture feature to obtain the first semantic feature and the second semantic feature.
  • the semantic segmentation device extracts the first texture feature by referring to the semantic model, it can continue to perform feature extraction on the first texture feature to obtain the first semantic feature; the second semantic feature is extracted through the semantic segmentation model to be trained. After the texture feature, feature extraction can be continued on the second texture feature to obtain the second semantic feature.
  • the reference semantic model and the semantic segmentation model to be trained include multi-layer convolutional layers, and the multi-layer convolutional layers can obtain multiple intermediate features, wherein the first layer of convolutional layers performs feature extraction on image samples The first layer of intermediate features is obtained, and the second layer of convolutional layer performs feature extraction on the first layer of intermediate features to obtain the second layer of intermediate features, and so on, to obtain multiple intermediate features.
  • the multi-layer convolutional layer may include at least one low-level convolutional layer and one high-level convolutional layer; at least one intermediate feature obtained through at least one low-level convolutional layer is a low-level feature, that is, the first texture feature, An intermediate feature obtained through a high-level convolutional layer is a high-level feature, that is, the first semantic feature. That is to say, the semantic segmentation model needs to acquire the first texture feature before acquiring the first semantic feature.
  • the semantic segmentation device extracts features from image samples by referring to the semantic model and the multi-layer convolutional layers in the semantic segmentation model to be trained, and can sequentially obtain the first texture feature and the first semantic feature.
  • contour decomposition or enhancement processing is performed on the first intermediate feature and the second intermediate feature respectively to obtain the realization of the first transformation feature and the second transformation feature, which may include at least one of the following:
  • the first texture feature and the second texture feature are subjected to contour decomposition processing to obtain the first contour feature and the second contour feature;
  • the first semantic feature and the second semantic feature are enhanced to obtain the first enhanced feature and the second enhanced feature.
  • the semantic segmentation device when the first intermediate feature includes the first texture feature, the second intermediate feature includes the second texture feature; the semantic segmentation device can perform contour decomposition processing on the first texture feature and the second texture feature respectively , get the first contour feature and the second contour feature, take the first contour feature as the first transformation feature, and the second contour feature as the second transformation feature; when the first intermediate feature includes the first semantic feature, the second intermediate The feature includes the second semantic feature; the semantic segmentation device can respectively enhance the first semantic feature and the second semantic feature to obtain the first enhanced feature and the second enhanced feature; the first intermediate feature includes the first texture feature and the first In the case of semantic features, the second intermediate feature includes the second texture feature and the second semantic feature; the semantic segmentation device can perform contour decomposition processing on the first texture feature and the second texture feature respectively to obtain the first contour feature and the second contour feature; and performing enhancement processing on the first semantic feature and the second semantic feature respectively to obtain the first enhanced feature and the second enhanced feature.
  • the implementation of the semantic segmentation device performing contour decomposition processing on the first texture feature and the second texture feature to obtain the first contour feature and the second contour feature may include: S301-S302.
  • the semantic segmentation device may perform contour decomposition on the first texture feature through a contourlet decomposition part (Contourlet Decomposition Module, CDM) to obtain the first contour feature.
  • CDM contourlet Decomposition Module
  • the contourlet decomposition part includes a combination of at least one low-pass filter (Low-pass FIilter) and a direction filter (Drectional Filter Bank, DFB); Perform filtering to decompose the input features into high-pass subbands and low-pass subbands; the directional filter is used to perform directional filtering on the high-pass subbands to obtain directional subbands; thus, the combination of each set of low-pass filters and directional filters A Laplacian pyramid decomposition can be achieved.
  • Low-pass FIilter Low-pass FIilter
  • DFB Direction Filter Bank
  • respective high-pass subbands and low-pass subbands are obtained; direction filtering is performed on the high-pass subbands, The direction sub-band is obtained; the low-pass sub-band and the direction sub-band corresponding to the first texture feature and the second texture feature are respectively fused to obtain the first contour feature and the second contour feature, thereby completing the contour decomposition process.
  • each group of LPs and DFBs can perform one-level contour decomposition, and the level limitation in the contour decomposition is not limited in the embodiment of the present disclosure.
  • performing contour decomposition processing on the first texture feature to obtain the realization of the first contour feature may include: S3011-S3012.
  • S3011 Perform at least one level of contour decomposition on the first texture feature by combining at least one level of LP and DFB to obtain at least one direction subband and one lowpass subband.
  • At least one direction subband and one low-pass subband can be obtained; wherein at least one direction subband is the corresponding at least one The direction sub-band obtained by the first-level decomposition, and one low-pass sub-band is the low-pass sub-band obtained by the last-level decomposition.
  • CDM decomposes the first texture feature refer to formula (1).
  • is a sampling operator
  • p is an interlaced scanning factor
  • F l,n represents the nth-level low-pass subband (feature), that is, the low-pass sub-band (feature) obtained by n-level decomposition
  • m is the group number of LP and DFB combination in the contourlet decomposition part.
  • the first decomposition is to decompose the first texture feature through LP to obtain the first-level low-pass sub-band feature F l,1 and the first-level high-pass sub-band feature F h,1 .
  • the number of intermediate features in the first texture feature is the same as the number of CDMs in the contour decomposition part; that is, each intermediate feature in the first texture feature needs to use one CDM.
  • CDM comprises the combination of 2 groups of low-pass filter LP and direction filter DFB, and the 1st group comprises LP 1 and DFB 1 , and the 2nd group comprises LP 2 and DFB 2 ; Texture feature F input After CDM, one low-pass subband F l,2 and two direction subbands F bds,1 and F bds,2 can be obtained.
  • the first-level high-pass sub-band F h,1 and the first-level low-pass sub-band F l,1 can be obtained; through DFB 1 , the first-level high-pass sub-band F h,1 performs directional filtering to obtain the first-level directional sub-band F bds,1 ; according to (2,2), down-sample the first-level low-pass sub-band F l,1 to obtain the first-level after downsampling
  • the length and width of the low-pass sub-band F l,1 -J are 1/2 of the first-level low-pass sub-band F l,1 .
  • DFB 1 includes a 4-level binary tree, F bds,1 includes 16 direction sub-bands; DFB 2 includes a 3-level binary tree, F bds,1 includes 8 direction sub-bands.
  • the semantic segmentation device implements the decomposition of the n-th low-pass subband by LP, which may include: performing low-pass analysis filtering on the n-th low-pass sub-band through a low-pass analysis filter to obtain the n+1th low-pass sub-band level low-pass result; after that, down-sample the n+1th level low-pass result to obtain the n+1th level low-pass subband; then upsample the n+1th level low-pass subband, and the The n+1th low-pass subband passes through the synthesis filter to obtain the n+1th low-pass result, based on the nth low-pass subband and the n+1th low-pass result, the n+1th high-pass is obtained Subband.
  • the semantic segmentation device may use the nth level low pass subband to subtract the n+1th level low pass result to obtain the n+1th level high pass subband.
  • the semantic segmentation device may calculate the difference between the nth level low-pass subband and the n+1th level low-pass result according to elements to obtain the n+1th level high-pass subband.
  • the low-pass filter comprises a low-pass analysis filter 41, a downsampling part 42, an upsampling part 43, a synthesis filter 44 and a subtraction part 45;
  • 1, n input low-pass analysis filter 41 can obtain the n+1th level low-pass result F l-n+1 ;
  • the down-sampling part 42 carries out down-sampling processing to the n+1th level low-pass result, obtain the following
  • the n+1th level low-pass result F l,n+1 after sampling is used as the n+1st level low-pass sub-band;
  • the down-sampled n+1th level low-pass result is upsampled by the upsampling part 43 , to obtain the n+1th level low-pass result F l-n+1 ;
  • input the n+1th level low-pass result F l-n+1 into the synthesis filter 44 to obtain the n+1th level after synthesis filtering
  • the low-pass result F l-n+1 finally
  • At least one level of direction subbands and the last level of low-pass subbands can be obtained; wherein, at least one level of direction subbands and the last level of The feature dimensions of the low-pass sub-bands are different, and the semantic segmentation device needs to transform the dimensions of at least one direction sub-band and the last-level low-pass sub-band through the pooling layer to obtain at least one transformed direction sub-band and the last-level transformed low-pass sub-band. pass sub-band, and then perform the first fusion process on at least one transformation direction sub-band and the last-level transformation low-pass sub-band to obtain the first contour feature F te
  • the first fusion process may include: adding at least one transformation direction subband and the last-stage transformation low-pass subband, or splicing at least one transformation direction subband and the last-stage transformation low-pass subband, etc. , which is not limited by the embodiment of the present disclosure.
  • CDM decomposition series the richer the extracted first contour features, and the higher the accuracy achieved by the trained student network, but the higher the computational complexity.
  • the number of stages of CDM decomposition can be set as required.
  • the manner in which the semantic segmentation device performs contour decomposition processing on the second texture feature is the same as the manner in which the contourlet decomposition processing is performed on the first texture feature in S301, see the description in S301 for details, and here, Let me repeat.
  • the manner in which the semantic segmentation apparatus performs enhancement processing on the first semantic feature and the second semantic feature to obtain the first enhanced feature and the second enhanced feature may include: S401-S402.
  • the semantic segmentation device may perform enhancement processing on the first semantic feature through a Semantic Attention Module (SAM) to obtain the first enhanced feature.
  • SAM Semantic Attention Module
  • performing enhancement processing on the first semantic feature and the second semantic feature to obtain the first enhanced feature and the second enhanced feature may include: performing at least Two kinds of transformations, to obtain at least two semantic transformation features corresponding to each; performing self-enhancement processing on different semantic transformation characteristics in the at least two semantic transformation characteristics to obtain a correlation matrix; combining the correlation matrix with one of the at least two semantic transformation characteristics
  • the semantic transformation feature is enhanced to obtain the self-enhancement feature; based on the self-enhancement matrix corresponding to the first semantic feature and the second semantic feature, the respective self-enhancement features are determined as the first enhancement feature and the second enhancement feature; or, the The first semantic feature and the second semantic feature are respectively fused with their respective self-enhancing features to obtain the first enhanced feature and the second enhanced feature.
  • performing enhancement processing on the first semantic feature in S401 to obtain the realization of the first enhanced feature may include: S4011-S4014.
  • the semantic segmentation device can respectively perform the first transformation, the second transformation and the third transformation on the first semantic feature to obtain the first semantic transformation feature, the second semantic transformation feature, and the third semantic transformation feature; wherein , the number of vectors included in the first semantic transformation feature is equal to the number of channels C; the number of vectors included in the second semantic transformation feature is equal to the number of pixels (H ⁇ M).
  • the first semantic transformation feature and the second semantic transformation feature are mutual transposition matrices.
  • the first semantic transformation feature and the third semantic transformation feature are the same matrix feature.
  • the first semantic transformation feature and the second semantic transformation moment feature are multiplied by matrix multiplication, and the obtained matrix is a correlation feature; the elements in the matrix of the correlation feature can represent the correlation between pixels; the correlation The greater the correlation, the greater the element value; the smaller the correlation, the smaller the element value.
  • the obtained matrix is a self-enhancement feature; that is, the correlation feature is used to enhance the third semantic transformation feature, so that the self-enhancement matrix contains the correlation of pixels .
  • the semantic segmentation device may determine the first enhancement feature according to the self-enhancement feature.
  • the semantic segmentation device may use self-augmentation features as the first enhancement matrix.
  • the first semantic feature matrix is H ⁇ W ⁇ C matrix MF
  • the first semantic transformation matrix MF1 of C ⁇ (H ⁇ W) can be obtained
  • the first semantic transformation matrix MF1 and the second semantic transformation matrix MF2 multiplication can obtain the correlation matrix MFC of C * C
  • the self-enhancement matrix MFp1 of H ⁇ W ⁇ C is obtained; thus, the self-enhancement matrix MFp1 contains the correlation between elements, and the semantic segmentation device can use MFp1 as the first enhancement matrix feature matrix.
  • the semantic segmentation device may perform a second fusion process on the self-enhanced feature and the first semantic feature to obtain the first enhanced feature.
  • the second fusion process may include: matrix addition of the self-enhancement feature and the first semantic feature, or weighted addition of the self-enhancement feature and the first semantic feature; the weighted weight may be based on It needs to be set, which is not limited by the embodiments of the present disclosure.
  • the semantic segmentation device after obtaining the self-enhancement matrix MFp1 of H ⁇ W ⁇ C, the semantic segmentation device weights the self-enhancement matrix according to the weight ⁇ , and combines it with the first semantic feature matrix MF according to the element Adding and processing to obtain the first enhanced feature matrix MFp2.
  • the method of enhancing the second semantic feature by the semantic segmentation device is the same as the method and principle of enhancing the first semantic feature in S401, see the description in S401 for details, and no longer repeat.
  • the semantic segmentation model to be trained is trained, and the realization of the semantic segmentation model is determined, which may include: S501-S503.
  • the preset first loss function may be the mean square error function
  • the semantic segmentation device may calculate the first mean variance for the first contour feature and the second contour feature, and use the first mean variance as the second A loss; the difference between the first profile feature and the second profile feature is characterized by the first loss.
  • the semantic segmentation device may calculate the variance of the contour feature corresponding to the i-th pixel in the first contour feature and the contour feature corresponding to the i-th pixel in the second contour feature to obtain R first variances ; After summing the R first variances to obtain the sum of the first variances, divide the sum of the first variances by the total number of pixels to obtain the mean value of the first variances.
  • the preset second loss function may be a mean square error function
  • the semantic segmentation device may calculate the second variance mean value for the first enhanced feature and the second enhanced feature, and use the second variance mean value as the second loss;
  • the difference between the first augmented feature and the second augmented feature is characterized by a second loss.
  • the semantic segmentation device may calculate the variance of the semantic feature corresponding to the i-th pixel in the first semantic feature and the semantic feature corresponding to the i-th pixel in the second semantic feature, to obtain R second variances; After summing the R second variances to obtain the second sum of variances, divide the second sum of variances by the total number of pixels to obtain the mean value of the second variances.
  • the semantic segmentation device may perform training on the semantic segmentation model to be trained according to at least one of the first loss and the second loss, and determine the semantic segmentation model.
  • the semantic segmentation device may perform training on the semantic segmentation model to be trained according to the first loss to determine the semantic segmentation model.
  • the semantic segmentation device may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the first loss is less than the first loss threshold.
  • the semantic segmentation device may perform training on the semantic segmentation model to be trained according to the second loss to determine the semantic segmentation model.
  • the semantic segmentation device may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the second loss is less than the second loss threshold.
  • the semantic segmentation device may perform training on the semantic segmentation model to be trained according to the first loss and the second loss, and determine the semantic segmentation model.
  • the semantic segmentation device may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the first loss is less than the first loss threshold and the second loss is less than the second loss threshold;
  • the first loss and the second loss may be weighted and summed to obtain the first semantic loss, and when the first semantic loss is smaller than the first semantic loss threshold, the semantic segmentation model is determined.
  • a is two images to be processed
  • b is the feature map obtained after the semantic segmentation model without learning texture knowledge extracts the features of the two images to be processed
  • c is the semantic segmentation model that has learned texture knowledge for the two Feature extraction is performed on the images to be processed, and the obtained feature maps are obtained. It can be seen from Figure 6 that after the semantic segmentation model learns the texture knowledge, the feature map contains rich texture knowledge, and the outline is clearer.
  • the semantic segmentation model to be trained is trained based on at least the first transformation feature and the second transformation feature, and the realization of the semantic segmentation model is determined, which may also include: S601-S603.
  • the reference semantic model and the semantic segmentation model to be trained include a pooling layer, and the pooling layer is after the last convolutional layer; after the reference semantic model obtains the first enhanced feature, it can pass pooling The layer performs semantic segmentation prediction on the first enhanced feature to obtain the first semantic segmentation feature; after the semantic segmentation model to be trained obtains the second enhanced feature, it can perform semantic segmentation prediction on the second enhanced feature to obtain the second semantic segmentation feature.
  • S602. Perform loss calculation based on the preset third loss function, the first semantic segmentation feature, and the second semantic segmentation feature, and determine the third loss;
  • the preset third loss function may be a mean square error function
  • the semantic segmentation device may calculate the third-party mean value for the first semantic segmentation feature and the second semantic segmentation feature, and use the third-party mean value as the third Loss; the difference between the first semantic segmentation feature and the second semantic segmentation feature is represented by the third loss.
  • formula (4) can be referred to.
  • the semantic segmentation device can calculate the variance of the semantic feature corresponding to the i-th element in the first semantic segmentation feature and the i-th pixel in the second semantic segmentation feature, and obtain R third Variance; after summing the R third-party differences to obtain the sum of the third-party differences, divide the sum of the third-party differences by the total number of pixels to obtain the mean value of the third-party differences.
  • S603 train the semantic segmentation model to be trained based on the third loss, and determine the semantic segmentation model; or, based on at least one of the first loss, the second loss, and the third loss, train the semantic segmentation model to be trained, Determine the semantic segmentation model.
  • the semantic segmentation device may perform training on the semantic segmentation model to be trained according to the third loss to determine the semantic segmentation model.
  • the semantic segmentation device may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the third loss is less than the third loss threshold.
  • the semantic segmentation device may stop the training of the semantic segmentation model to be trained to obtain the semantic segmentation model when the first loss is less than the first loss threshold and the third loss is less than the third loss threshold;
  • the first loss and the third loss may be weighted and summed to obtain the second semantic loss, and when the second semantic loss is smaller than the second semantic loss threshold, the semantic segmentation model is determined.
  • the semantic segmentation device may stop the training of the semantic segmentation model to be trained to obtain the semantic segmentation model when the second loss is less than the first loss threshold and the third loss is less than the third loss threshold;
  • the second loss and the third loss may be weighted and summed to obtain the third semantic loss, and when the third semantic loss is less than the third semantic loss threshold, the semantic segmentation model is determined.
  • the first texture feature includes: at least one first sub-texture feature; the second texture feature includes: at least one second sub-texture feature; in S601, the first enhanced feature and the second enhanced feature are respectively
  • the implementation after performing semantic segmentation prediction and obtaining the first semantic segmentation feature and the second semantic segmentation feature may also include: S701-S704.
  • the semantic segmentation device after the semantic segmentation device obtains the first semantic segmentation feature, the first semantic feature, and at least one first sub-texture feature, it can The first graph inference is performed on the texture feature to obtain the first graph inference relationship.
  • determining the first graph reasoning relationship includes: based on the output order, determining the first semantic segmentation feature, the second At least two difference features between a semantic feature and at least one first sub-texture feature; correlation processing is performed on at least two difference features to obtain a correlation between the difference features; based on at least two difference features and difference features The degree of correlation among them constitutes the reasoning relationship in the first graph.
  • At least one first sub-texture feature corresponds to at least one intermediate feature obtained by referring to at least one low-level convolutional layer in the semantic model; the semantic segmentation device can combine the first semantic segmentation feature, the first semantic feature, at least A first sub-texture feature, according to the order of the convolutional layer and the pooling layer from back to front (that is, the output order of the feature, from back to front), determine the feature change between two adjacent layers, and obtain multiple first sub-texture features.
  • a relationship feature ie at least two difference features).
  • the semantic segmentation device may use the plurality of first relational features as a plurality of first nodes, and according to the correlation between the plurality of first relational features (that is, correlation degrees), connect multiple first nodes to construct a first relational graph G T , refer to formula (5); use the first relational graph to represent the reasoning relation in the first graph.
  • G T represents the first relational graph
  • ⁇ T represents the node in the first relational graph
  • ⁇ T represents the connection edge in the first relational graph
  • F i va,T represents the i-th in the N first relational features
  • a ij T represents the edge between F i va,T and F j va,T
  • N represents the number of first relationship features; i,j ⁇ [1,N-1], and, i ⁇ j.
  • F i va,T can refer to the i+1th layer feature in the semantic model
  • F i T For the similarity characterization between and i-th layer feature F i T , refer to formula (6).
  • the first graph reasoning relationship is formed.
  • each feature edge can be connected, for example, A ij It can be obtained by formula (7-1):
  • f si represents the similarity between vectors.
  • a ij can be obtained by formula (7-2):
  • f si represents the similarity between vectors
  • is the similarity threshold
  • S702. Determine a second graph reasoning relationship based on the second semantic segmentation feature, the second semantic feature, and at least one second sub-texture feature.
  • the semantic segmentation device after the semantic segmentation device obtains the second semantic segmentation feature, the second semantic feature, and at least one second sub-texture feature, it can The second graph reasoning is performed on the texture feature to obtain the second graph reasoning relationship.
  • At least one second sub-texture feature corresponds to at least one intermediate feature obtained by at least one low-level convolutional layer in the semantic segmentation model to be trained; the semantic segmentation device can combine the second semantic segmentation feature, the second semantic Features, at least one second sub-texture feature, according to the order of the convolution layer and the pooling layer from back to front, determine the feature change between two adjacent layers, and obtain multiple second relationship features.
  • the semantic segmentation device can use the plurality of second relational features as a plurality of second nodes to connect edges according to the edge connection method in the first relational graph, and construct For the second relationship graph G S , refer to formula (5); the reasoning relationship in the second graph is represented by the second relationship graph.
  • the first graph reasoning relationship includes nodes and edges; the second graph reasoning relationship also includes nodes and edges; the preset fourth loss function is used to characterize the relationship between the first relationship graph and the second relationship graph The vector distance of , as the fourth loss, characterizes the difference between the first relationship graph and the second relationship graph through the fourth loss, see formula (8).
  • L va (S) represents the fourth loss
  • G T represents the first relationship graph
  • G S represents the second relationship graph
  • Dist represents vector distance
  • the first relationship graph G T includes the first node ⁇ T and the first edge ⁇ T ;
  • the second relationship graph G S includes the second node ⁇ S and the second edge ⁇ S ;
  • the semantic segmentation device can first determine the node vector distance between the first node ⁇ T and the second node ⁇ S , and the edge vector distance between the first connecting edge ⁇ T and the second connecting edge ⁇ S , and then determine the node
  • the vector distance and the edge vector distance are weighted and summed to obtain the fourth loss, refer to formula (9).
  • is a weighted weight, which can be set as required, and is not limited in this embodiment of the present disclosure.
  • formula (9) can also be expressed as formula (10):
  • the semantic segmentation device may perform training on the semantic segmentation model to be trained according to the fourth loss to determine the semantic segmentation model.
  • the semantic segmentation device may stop training the semantic segmentation model to be trained to obtain the semantic segmentation model when the fourth loss is less than the fourth loss threshold.
  • the semantic segmentation device may stop training the semantic segmentation model to be trained when the fourth loss is less than the fourth loss threshold and the first loss is less than the first loss threshold, to obtain the semantic segmentation model ; Or, carry out weighted summation to the 4th loss and the 1st loss, obtain the 4th semantic loss, in the case that the 4th semantic loss is less than the threshold value of the 4th semantic loss, stop the training of the semantic segmentation model to be trained, obtain the semantic segmentation Model.
  • the semantic segmentation device may stop training the semantic segmentation model to be trained when the fourth loss is less than the fourth loss threshold and the second loss is less than the second loss threshold, to obtain the semantic segmentation model ; Or, carry out weighted summation to the 4th loss and the 2nd loss, obtain the 5th semantic loss, under the situation that the 5th semantic loss is less than the 5th semantic loss threshold value, stop the training of the semantic segmentation model to be trained, obtain the semantic segmentation Model.
  • the semantic segmentation device may stop training the semantic segmentation model to be trained when the fourth loss is less than the fourth loss threshold and the third loss is less than the third loss threshold, to obtain the semantic segmentation model ; Or, carry out weighted summation to the 4th loss and the 3rd loss, obtain the 6th semantic loss, under the situation that the 6th semantic loss is less than the 6th semantic loss threshold value, stop the training of the semantic segmentation model to be trained, obtain the semantic segmentation Model.
  • the semantic segmentation device may stop the semantic segmentation to be trained when the fourth loss is less than the fourth loss threshold, the third loss is less than the third loss threshold, and the second loss is less than the second loss threshold.
  • the segmentation model is trained to obtain a semantic segmentation model; or, the fourth loss, the third loss, the second loss and the first loss are weighted and summed to obtain the seventh semantic loss, and the seventh semantic loss is less than the seventh semantic loss threshold In the case of , the training of the semantic segmentation model to be trained is stopped to obtain the semantic segmentation model.
  • the semantic segmentation device can determine the response loss based on the first semantic segmentation result and the second semantic segmentation result; according to the first loss, the second loss, the third loss, the fourth loss and the response loss, Train the semantic segmentation model to be trained to obtain the semantic segmentation model.
  • the response loss L r (S) can be obtained according to formula (11):
  • the semantic segmentation device can determine the training loss based on the second semantic segmentation result and the image sample; according to the first loss, the second loss, the third loss, the fourth loss, the response loss and the training loss, Train the semantic segmentation model to be trained to obtain the semantic segmentation model.
  • the response loss L sa (S) can be obtained according to formula (12):
  • F i sa is the feature corresponding to the i-th pixel in the image sample.
  • both the teacher network and the student network include four convolutional layers and one pooling layer, and the pooling layer is pooled through a pyramid.
  • Partial (Pyramid Pooling Module, PPM) implementation the first three convolutional layers are low-level convolutional layers, the fourth convolutional layer is a high-level convolutional layer, and the high-level convolutional layer is connected to a SAM; the teacher network passes through four convolutional layers and one pooling layer.
  • Figure 8 shows a schematic diagram of the semantic segmentation results of the student network, as shown in Figure 8, a is the original image in the urban scene, b is the semantic segmentation result of the student network in related technologies, and c is the student network in this case The semantic segmentation result of the network, d is the image sample of the semantic segmentation of the original image in a; it can be seen that the semantic segmentation result of the student network in this case contains richer information and is closer to the image sample.
  • the knowledge distillation method in Figure 7 is applied to an urban scene, and Table 1 shows the comparison results of the average intersection-over-union ratios of the student network and the teacher network in the urban scene. As shown in Table 1, the average intersection ratio of the student network itself is the lowest. After using structure knowledge distillation (Structure Knowledge Distillation, SKD), the average intersection ratio is improved. Using class feature variation distillation (Intra-class Feature Variation Distillation, IFKD ), the average cross-merge ratio is further improved, and the average cross-merge ratio using the method of this case is the highest.
  • structure knowledge distillation Structure Knowledge Distillation
  • IFKD class feature variation distillation
  • IFKD Intra-class Feature Variation Distillation
  • ResNet18 itself adopts the val test set with an average intersection ratio of 69.1, which is 9.46% different from the teacher network, and uses the test test set with an average intersection ratio of 67.6, which is 9.18% different from the teacher network; in this case
  • the average intersection ratio of the method using the val test set is 75.82, which is 6.72% higher than that of ResNet18.
  • the average intersection ratio of the test test set is 73.78, which is 6.18% higher than that of ResNet18.
  • the method in this case is closest to the teacher network.
  • the average intersection ratio; among them, val is the test set used in the training process, and val is used to judge the learning state in time according to the training results.
  • test is the test set used to evaluate the model results after the training model is finished. It can be seen from Table 1 that the accuracy of the student network trained by the method of this case has been significantly improved, and its accuracy is closest to that of the teacher network.
  • FIG. 9 is a schematic diagram of an optional composition structure of the semantic segmentation device provided by the embodiment of the present disclosure. As shown in 9, the semantic segmentation device 20 includes:
  • the feature acquisition part 2000 is configured to acquire the image to be processed
  • the semantic segmentation part 2004 is configured to use the semantic segmentation model to perform semantic segmentation processing on the image to be processed to obtain the semantic segmentation result of the image to be processed;
  • the semantic segmentation model is the first intermediate feature output by the reference semantic model
  • the first transformation feature for contour decomposition or enhancement processing is used as a reference, which is obtained by training the second transformation feature for contour decomposition or enhancement processing in combination with the second intermediate feature output by the semantic segmentation model to be trained;
  • the first intermediate feature and the second intermediate feature comprise at least one of the following:
  • the first transform feature and the second transform feature comprise at least one of the following:
  • a first enhancement feature and a second enhancement feature are provided.
  • the semantic segmentation device 20 also includes:
  • the feature extraction part 2001 is configured to use the reference semantic model and the semantic segmentation model to be trained to perform feature extraction on image samples, respectively, to obtain the first intermediate feature and the second intermediate feature;
  • the reference semantic The model is a pre-trained semantic segmentation network;
  • the semantic segmentation model to be trained is a network consistent with the function of the reference semantic model;
  • the feature processing part 2002 is configured to perform contour decomposition or enhancement processing on the first intermediate feature and the second intermediate feature to obtain the first transformed feature and the second transformed feature;
  • the training part 2003 is configured to train the semantic segmentation model to be trained based on at least the first transformation feature and the second transformation feature, and determine the semantic segmentation model.
  • the feature extraction part 2001 is further configured to use the reference semantic model and the semantic segmentation model to be trained to perform feature extraction on image samples, respectively, to obtain the first texture feature and the second texture feature ; Perform feature extraction on the first texture feature and the second texture feature to obtain the first semantic feature and the second semantic feature.
  • the feature processing part 2002 is further configured to perform contour decomposition processing on the first texture feature and the second texture feature respectively to obtain the first contour feature and the second contour feature; or, Perform enhancement processing on the first semantic feature and the second semantic feature to obtain a first enhanced feature and a second enhanced feature.
  • the feature processing part 2002 is further configured to perform contour decomposition processing on the first texture feature and the second texture feature respectively to obtain the first contour feature and the second contour feature; and
  • the first semantic feature and the second semantic feature are enhanced to obtain a first enhanced feature and a second enhanced feature.
  • the training part 2003 is further configured to perform loss calculation based on a preset first loss function, the first profile feature, and the second profile feature to determine a first loss; Two loss functions, the first enhanced features and the second enhanced features, determine a second loss; based on at least one of the first loss and the second loss, the semantic segmentation model to be trained is trained , to determine the semantic segmentation model.
  • the training part 2003 is further configured to perform semantic segmentation prediction on the first enhanced feature and the second enhanced feature to obtain the first semantic segmentation feature and the second semantic segmentation feature; based on Preset the third loss function, the first semantic segmentation feature and the second semantic segmentation feature to perform loss calculation to determine a third loss; train the semantic segmentation model to be trained based on the third loss to determine The semantic segmentation model is obtained; or, based on at least one of the first loss, the second loss, and the third loss, the semantic segmentation model to be trained is trained to determine the semantic segmentation model.
  • the first texture feature includes: at least one first sub-texture feature; the second texture feature includes: at least one second sub-texture feature; the training part 2003 is further configured to Semantic segmentation prediction is performed on the first enhanced feature and the second enhanced feature, and after the first semantic segmentation feature and the second semantic segmentation feature are obtained, based on the first semantic segmentation feature, the first semantic feature, the at least one The first sub-texture feature determines the first graph reasoning relationship; based on the second semantic segmentation feature, the second semantic feature, and the at least one second sub-texture feature, determines the second graph reasoning relationship; based on the preset fourth loss function, the first graph reasoning relationship and the second graph reasoning relationship to perform loss calculation to determine a fourth loss; to train the semantic segmentation model to be trained based on the fourth loss to determine the semantic segmentation model; or, based on at least one of the first loss, the second loss, the third loss, and the fourth loss, the semantic segmentation model to be trained is trained to determine the semantic segmentation model.
  • the training part 2003 is further configured to determine at least two of the first semantic segmentation feature, the first semantic feature, and the at least one first sub-texture feature based on the output order. difference features; performing correlation processing on the at least two difference features to obtain the correlation between the difference features; based on the at least two difference features and the correlation between the difference features, forming the first Graph reasoning relationships.
  • the training part 2003 is further configured to, if there is an association feature between target differences that is less than or equal to a preset association threshold in the correlation between the difference features, based on the target difference
  • the association features and the at least two difference features constitute the first graph reasoning relationship.
  • the feature processing part 2002 is further configured to filter both the first texture feature and the second texture feature based on an interlacing factor to obtain respective high-pass subbands and low-pass subbands. pass sub-bands; performing direction filtering on the high-pass sub-bands to obtain direction sub-bands; respectively fusing the low-pass sub-bands and the direction sub-bands corresponding to the first texture feature and the second texture feature , to obtain the first contour feature and the second contour feature, and complete the contour decomposition process.
  • the feature processing part 2002 is further configured to perform at least two transformations on the first semantic feature and the second semantic feature to obtain at least two semantic transformation features corresponding to each; Perform self-enhancement processing on different semantic transformation features among the at least two semantic transformation features to obtain a correlation matrix; perform enhancement processing on the correlation matrix and one of the at least two semantic transformation features to obtain self-enhancement feature; based on the self-enhancement matrix corresponding to the first semantic feature and the second semantic feature, determine the respective self-enhancement features as the first enhancement feature and the second enhancement feature; or, the The first semantic feature and the second semantic feature are respectively fused with their respective self-enhancing features to obtain the first enhanced feature and the second enhanced feature.
  • a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.
  • FIG. 10 is a schematic diagram of an optional composition structure of the electronic device provided by the embodiment of the present disclosure.
  • the electronic device 21 includes: a processor 2101 and a memory 2102,
  • the memory 2102 stores a computer program that can run on the processor 2101, and the processor 2101 implements the steps of any semantic segmentation method in the embodiment of the present disclosure when the computer program is executed; the processor 2101 and the memory 2102 communicate through bus 2103 connection.
  • the memory 2102 is configured to store computer programs and applications by the processor 2101, and can also cache data to be processed or processed by the processor 2101 and various parts of the electronic device (for example, image data, audio data, voice communication data and video communication data) Data), which can be implemented by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • flash memory FLASH
  • random access Memory Random Access Memory
  • the processor 2101 executes the program, the steps of any semantic segmentation method described above are realized.
  • the processor 2101 generally controls the overall operation of the electronic device 21 .
  • the above-mentioned processor can be an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD), a programmable logic device (Programmable Logic At least one of Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Central Processing Unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor. Understandably, the electronic device that implements the above processor function may also be other, which is not limited in this embodiment of the present disclosure.
  • An embodiment of the present disclosure provides a computer-readable storage medium storing a computer program configured to implement the above semantic segmentation method when executed by a processor.
  • a computer readable storage medium may be a tangible device that holds and stores instructions for use by an instruction execution device, and may be a volatile storage medium or a nonvolatile storage medium.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • USB flash drives magnetic disks, optical disks, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable Read Only Memory (EPROM or Flash), Static Random Access Reader (ROM), Portable Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Memory Encoding Device, Examples include punched cards with instructions stored thereon, or recessed-in-groove structures, and any suitable combination of the foregoing.
  • a computer-readable storage medium is not to be construed as a transient signal per se, such as a radio wave or other freely propagating battery wave, a battery wave propagating through a waveguide or other media medium (e.g., a pulse of light through a fiber optic cable), or Electrical signals transmitted through wires.
  • the above-mentioned memory can be a read-only memory (Read Only Memory, ROM), a programmable read-only memory (Programmable Read-Only Memory, PROM), an erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM), a computer Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), Magnetic Random Access Memory (Ferromagnetic Random Access Memory, FRAM), Flash Memory (Flash Memory), Magnetic Surface Memory, Optical Disk, or CD-ROM (Compact Disc Read-Only Memory, CD-ROM) and other memories; it can also be various terminals including one or any combination of the above-mentioned memories, such as mobile phones, computers, tablet devices, personal digital assistants, etc.
  • An embodiment of the present disclosure provides a computer program, where the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on a computer, the computer executes the above semantic segmentation method.
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. Wait.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present disclosure.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may be used as a single unit, or two or more units may be integrated into one unit; the above-mentioned integration
  • the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
  • the above-mentioned integrated units of the present disclosure are implemented in the form of software function parts and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the essence of the technical solutions of the embodiments of the present disclosure or the part that contributes to the related technologies can be embodied in the form of software products, the computer software products are stored in a storage medium, and include several instructions to make The equipment automatic test line executes all or part of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.
  • Embodiments of the present disclosure provide a semantic segmentation method, device, electronic equipment, and computer-readable storage medium.
  • the method includes: acquiring an image to be processed; using a semantic segmentation model to perform semantic segmentation processing on the image to be processed to obtain the image to be processed Semantic segmentation results; wherein, the semantic segmentation model is based on the first transformation feature of the first intermediate feature output by the reference semantic model for contour decomposition or enhancement processing, and combined with the second intermediate feature output by the semantic segmentation model to be trained.
  • the second transformation feature training of decomposition or enhancement processing is obtained;
  • the first intermediate feature and the second intermediate feature include at least one of the following groups: the first texture feature and the second texture feature; the first semantic feature and the second semantic feature; the first The transformed features and the second transformed features comprise at least one of the following: first profile features and second profile features; first enhancement features and second enhancement features.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供了一种语义分割方法、装置、电子设备和计算机可读存储介质,所述方法包括:获取待处理图像;采用语义分割模型,对待处理图像进行语义分割处理,得到待处理图像的语义分割结果;其中,语义分割模型是以参考语义模型输出的第一中间特征进行轮廓分解或增强处理的第一变换特征为参考,结合待训练的语义分割模型输出的第二中间特征进行轮廓分解或增强处理的第二变换特征训练得到的;第一中间特征和第二中间特征包括以下至少一组:第一纹理特征和第二纹理特征;第一语义特征和第二语义特征;第一变换特征和第二变换特征包括以下至少一组:第一轮廓特征和第二轮廓特征;第一增强特征和第二增强特征。

Description

语义分割方法、装置、电子设备和计算机可读存储介质
相关申请的交叉引用
本公开基于申请号为202110725811.8、申请日为2021年06月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及图像处理技术,尤其涉及一种语义分割方法、装置、电子设备和计算机可读存储介质。
背景技术
随着语义分割技术的发展,知识蒸馏被引入语义分割技术;知识蒸馏可以将复杂模型学到的知识转移到简单模型中,从而在实际应用中,可以方便的采用简单模型进行语义分割;然而,在进行知识转移的过程中,通常采用复杂模型的语义分割的结果作为基于响应的知识来指导简单模型进行学习,如此,转移到简单模型的知识不够丰富,导致学习完成的简单模型的语义分割的精度低。
发明内容
本公开实施例提供一种语义分割方法、装置、电子设备和计算机可读存储介质,提高了语义分割的精度。
本公开的技术方案是这样实现的:
本公开实施例提供一种语义分割方法,包括:
获取待处理图像;
采用语义分割模型,对所述待处理图像进行语义分割处理,得到所述待处理图像的语义分割结果;其中,
所述语义分割模型是以参考语义模型输出的第一中间特征进行轮廓分解或增强处理的第一变换特征为参考,结合待训练的语义分割模型输出的第二中间特征进行轮廓分解或增强处理的第二变换特征训练得到的;
所述第一中间特征和所述第二中间特征包括以下至少一组:
第一纹理特征和第二纹理特征;
第一语义特征和第二语义特征;
所述第一变换特征和所述第二变换特征包括以下至少一组:
第一轮廓特征和第二轮廓特征;
第一增强特征和第二增强特征。
上述方法中,所述参考语义模型为预先训练的语义分割网络;所述待训练的语义分割模型为与所述参考语义模型功能一致的网络;所述方法还包括:
采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到所述第一中间特征和所述第二中间特征;
对所述第一中间特征和所述第二中间特征分别进行轮廓分解或增强处理,得到所述第一变换特征和所述第二变换特征;
至少基于所述第一变换特征和所述第二变换特征,对所述待训练的语义分割模型进行训练,确定的出所述语义分割模型。
上述方法中,所述采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到所述第一中间特征和所述第二中间特征,包括:采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到第一纹理特征和第二纹理特征;对所述第一纹理特征和第二纹理特征进行特征提取,得到第一语义特征和第二语义特征。
上述方法中,所述对所述第一中间特征和所述第二中间特征分别进行轮廓分解或增强处理,得到所述第一变换特征和所述第二变换特征,包括以下至少一个:
对所述第一纹理特征和所述第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征;
对所述第一语义特征和所述第二语义特征进行增强处理,得到第一增强特征和第二增强特征。
上述方法中,所述至少基于所述第一变换特征和所述第二变换特征,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型,包括:基于预设第一损失函数、所述第一轮廓特征和所述第二轮廓特征进行损失计算,确定第一损失;基于预设第二损失函数、所述第一增强特征和所述第二增强特征,确定第二损失;基于所述第一损失和所述第二损失中的至少一个对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
上述方法中,所述至少基于所述第一变换特征和所述第二变换特征,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型,包括:对所述第一增强特征和所述第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征;基于预设第三损失函数、所述第一语义分割特征和所述第二语义分割特征进行损失计算,确定第三损失;基于所述第三损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型;或者,基于第一损失、第二损失中的至少一个,以及所述第三损失,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
上述方法中,所述第一纹理特征包括:至少一个第一子纹理特征;所述第二纹理特征包括:至少一个第二子纹理特征;所述对所述第一增强特征和所述第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征之后,所述方法还包括:基于所述第一语义分割特征、第一语义特征、所述至少一个第一子纹理特征,确定第一图推理关系;基于所述第二语义分割特征、第二语义特征、所述至少一个第二子纹理特征,确定第二图推理关系;基于预设第四损失函数、所述第一图推理关系和所述第二图推理关系进行损失计算,确定第四损失;基于所述第四损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型;或者,基于第一损失、第二损失、第三损失中的至少一个,以及所述第四损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
上述方法中,所述基于所述第一语义分割特征、第一语义特征、所述至少一个第一子纹理特征,确定第一图推理关系,包括:
基于输出顺序,确定所述第一语义分割特征、所述第一语义特征、所述至少一个第一子纹理特征之间的至少两个差异特征;
对所述至少两个差异特征,进行相关处理,得到差异特征之间的相关度;
基于所述至少两个差异特征和所述差异特征之间的相关度,构成所述第一图推理关系。
上述方法中,所述基于所述至少两个差异特征和所述差异特征之间的相关度,构成所述第一图推理关系,包括:
在所述差异特征之间的相关度中存在小于等于预设关联阈值的目标差异之间的关联特征情况下,基于所述目标差异之间的关联特征和所述至少两个差异特征,构成所述第一图推理关系。
上述方法中,所述对所述第一纹理特征和所述第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征,包括:
基于隔行扫描因子,对所述第一纹理特征和所述第二纹理特征均进行滤波处理后,得到各自的高通子带和低通子带;
对所述高通子带进行方向滤波,得到方向子带;
分别对所述第一纹理特征和所述第二纹理特征对应的所述低通子带和所述方向子带进行融合,得到所述第一轮廓特征和所述第二轮廓特征,完成轮廓分解处理。
上述方法中,所述对所述第一语义特征和所述第二语义特征进行增强处理,得到第一增强特征和第二增强特征,包括:
对所述第一语义特征和所述第二语义特征均进行至少两种转换,得到各自对应的至少两种语义变换特征;
对所述至少两种语义变换特征中不同的语义变换特征进行自增强处理,得到相关矩阵;
将所述相关矩阵与所述至少两种语义变换特征中的一个语义变换特征进行增强处理,得到自增强特征;
基于所述第一语义特征和所述第二语义特征各自对应的自增强矩阵,将各自的自增强特征确定为所述第一增强特征和所述第二增强特征;或者,将所述第一语义特征和所述第二语义特征,分别与各自的自增强特征进行融合,得到所述第一增强特征和所述第二增强特征。
本公开实施例提供一种语义分割装置,包括:
特征获取部分,被被配置为获取待处理图像;
语义分割部分,被配置为采用所述语义分割模型,对所述待处理图像进行语义分割处理,得到所述待处理图像的语义分割结果;所述语义分割模型是以参考语义模型输出的第一中间特征进行轮廓分解或增强处理的第一变换特征为参考,结合待训练的语义分割模型输出的第二中间特征进行轮廓分解或增强处理的第二变换特征训练得到的;
所述第一中间特征和所述第二中间特征包括以下至少一组:
第一纹理特征和第二纹理特征;
第一语义特征和第二语义特征;
所述第一变换特征和所述第二变换特征包括以下至少一组:
第一轮廓特征和第二轮廓特征;
第一增强特征和第二增强特征。
在一些实施例中,语义分割装置还包括:
特征提取部分,被配置为采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到所述第一中间特征和所述第二中间特征;所述参考语义模型为预先训练的语义分割网络;所述待训练的语义分割模型为与所述参考语义模型功能一致的网络;
特征处理部分,被配置为对所述第一中间特征和所述第二中间特征分别进行轮廓分解或增强处理,得到所述第一变换特征和所述第二变换特征;
训练部分,被配置为至少基于所述第一变换特征和所述第二变换特征,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
在一些实施例中,所述特征提取部分,还被配置为采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到第一纹理特征和第二纹理特征;对所述第一纹理特征和第二纹理特征进行特征提取,得到第一语义特征和第二语义特征。
在一些实施例中,所述特征处理部分,还被配置为对所述第一纹理特征和所述第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征;或者,对所述第一语义特征和所述第二语义特征进行增强处理,得到第一增强特征和第二增强特征。
在一些实施例中,所述特征处理部分,还被配置为对所述第一纹理特征和所述第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征;以及对所述第一语义特征和所述第二语义特征进行增强处理,得到第一增强特征和第二增强特征。
在一些实施例中,所述训练部分,还被配置为基于预设第一损失函数、所述第一轮廓特征和所述第二轮廓特征进行损失计算,确定第一损失;基于预设第二损失函数、所述第一增强特征和所述第二增强特征,确定第二损失;基于所述第一损失和所述第二损失中的至少一个对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
在一些实施例中,所述训练部分,还被配置为对所述第一增强特征和所述第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征;基于预设第三损失函数、所述第一语义分割特征和所述第二语义分割特征进行损失计算,确定第三损失;基于所述第三损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型;或者,基于第一损失、第二损失中的至少一个,以及所述第三损失,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
在一些实施例中,所述第一纹理特征包括:至少一个第一子纹理特征;所述第二纹理特征包括:至少一个第二子纹理特征;所述训练部分,还被配置为对所述第一增强特征和所述第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征之后,基于所述第一语义分割特征、第一语义特征、所述至少一个第一子纹理特征,确定第一图推理关系;基于所述第二语义分割特征、第二语义特征、所述至少一个第二子纹理特征,确定第二图推理关系;基于预设第四损失函数、所述第一图推理关系和所述第二图推理关系进行损失计算,确定第四损失;基于所述第四损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型;或者,基于第一损失、第二损失、第三损失中的至少一个,以及所述第四损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
在一些实施例中,所述训练部分,还被配置为基于输出顺序,确定所述第一语义分割特征、所述第一语义特征、所述至少一个第一子纹理特征之间的至少两个差异特征;对所述至少两个差异特征,进行相关处理,得到差异特征之间的相关度;基于所述至少两个差异特征和所述差异特征之间的相关度,构成所述第一图推理关系。
在一些实施例中,所述训练部分,还被配置为在所述差异特征之间的相关度中存在小于等于预设关联阈值的目标差异之间的关联特征情况下,基于所述目标差异之间的关联特征和所述至少两个差异特征,构成所述第一图推理关系。
在一些实施例中,所述特征处理部分,还被配置为基于隔行扫描因子,对所述第一纹理特征和所述第二纹理特征均进行滤波处理后,得到各自的高通子带和低通子带;对所述高通子带进行方向滤波,得到方向子带;分别对所述第一纹理特征和所述第二纹理特征对应的所述低通子带和所述方向子带进行融合,得到所述第一轮廓特征和所述第二轮廓特征,完成轮廓分解处理。
在一些实施例中,所述特征处理部分,还被配置为对所述第一语义特征和所述第二语义特征均进行 至少两种转换,得到各自对应的至少两种语义变换特征;对所述至少两种语义变换特征中不同的语义变换特征进行自增强处理,得到相关矩阵;将所述相关矩阵与所述至少两种语义变换特征中的一个语义变换特征进行增强处理,得到自增强特征;基于所述第一语义特征和所述第二语义特征各自对应的自增强矩阵,将各自的自增强特征确定为所述第一增强特征和所述第二增强特征;或者,将所述第一语义特征和所述第二语义特征,分别与各自的自增强特征进行融合,得到所述第一增强特征和所述第二增强特征。
本公开实施例提供一种电子设备,包括:
存储器,被配置为存储计算机程序;
处理器,被配置为执行所述存储器中存储的计算机程序时,实现上述语义分割方法。
本公开实施例提供一种计算机可读存储介质,存储有计算机程序,被配置为被处理器执行时,实现上述语义分割方法。
本公开实施例提供一种计算机程序,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在计算机上运行的情况下,所述计算机执行上述语义分割方法。
本公开实施例具有以下有益效果:
本公开实施例提供了一种语义分割方法、装置、电子设备和计算机可读存储介质;语义分割装置可以将参考语义模型在进行语义分割的过程中得到的基于多个特征的知识迁移到语义分割模型中;使语义分割模型学习到更加丰富的知识,从而提高了在使用语义分割模型对待处理图像进行语义分割时的语义分割精度。
附图说明
图1a是本公开实施例提供的一种可选的语义分割方法流程示意图;
图1b是本公开实施例提供的一种可选的语义分割模型的训练流程示意图;
图2为本公开实施例提供的一种可选的语义分割过程示意图;
图3为本公开实施例提供的一种可选的轮廓分解的方法示意图;
图4为本公开实施例提供的一种可选的低通滤波的方法示意图;
图5a为本公开实施例提供的一种可选的增强处理示意图;
图5b为本公开实施例提供的一种可选的增强处理示意图;
图6为本公开实施例提供的一种可选的纹理知识学习的效果示意图;
图7为本公开实施例提供的一种语义分割方法示意图;
图8为本公开实施例提供的一种可选的学生网络的语义分割结果示意图;
图9为本公开实施例提供的一种语义分割装置的组成结构示意图;
图10为本公开实施例提供的一种电子设备的组成结构示意图。
具体实施方式
为了使本公开的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本公开进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用于解释本公开,并不用于限定本公开。
以下结合附图及实施例,对本公开进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本公开,并不用于限定本公开。另外,以下所提供的实施例是用于实施本公开的部分实施例,而非提供实施本公开的全部实施例,在不冲突的情况下,本公开实施例记载的技术方案可以任意组合的方式实施。
需要说明的是,在本公开实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的方法或者装置不仅包括所明确记载的要素,而且还包括没有明确列出的其他要素,或者是还包括为实施方法或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括该要素的方法或者装置中还存在另外的相关要素(例如方法中的步骤或者装置中的单元,例如的单元可以是部分电路、部分处理器、部分程序或软件等等)。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,U和/或W,可以表示:单独存在U,同时存在U和W,单独存在W这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括U、W、V中的至少一种,可以表示包括从U、W和V构成的集合中选择的任意一个或多个元素。
例如,本公开实施例提供的展示方法包含了一系列的步骤,但是本公开实施例提供的展示方法不限于所记载的步骤,同样地,本公开实施例提供的展示装置包括了一系列部分,但是本公开实施例提供的展示装置不限于包括所明确记载的部分,还可以包括为获取相关信息、或基于信息进行处理时所需要设置的部分。
除非另有定义,本文所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本公开实施例的目的,不是旨在限制本公开。
对本公开实施例进行进一步详细说明之前,对本公开实施例中涉及的名词和术语进行说明,本公开实施例中涉及的名词和术语适用于如下的解释。
知识蒸馏:将复杂模型学习到的知识迁移到简单模型中去,从而使简单模型语义分割的精度趋近复杂模型;也就是说,将已训练好的复杂模型(例如参考语义模型)作为教师网络,简单模型(例如待训练的语义分割模型)作为学生网络,由教师网络指导学生网络学习知识,从而得到训练好的简单模型。其中,复杂模型结构庞大、精度高,而简单模型体量很小、精度与复杂模型存在差距。
需要说明的是,相关技术中的知识蒸馏,迁移到学生网络中的知识通常被视为基于响应的知识,通常被广泛的应用在目标检测、人体姿态估计等计算机视觉应用中。
本公开实施例提供一种语义分割方法、装置、电子设备和计算机可读存储介质,能够提高语义分割的精度。本公开实施例提供的语义分割方法应用于电子设备中,下面说明本公开实施例提供的电子设备的示例性应用。本公开实施例提供的电子设备可以实施为AR眼镜、笔记本电脑,平板电脑,台式计算机,机顶盒,移动设备(例如,移动电话,便携式音乐播放器,个人数字助理,专用消息设备,便携式游戏设备)等各种类型的用户终端,也可以实施为服务器,本公开实施例不作限制。
参见图1a,图1a是本公开实施例提供的语义分割方法的一个可选的流程示意图,将结合图1a示出的步骤进行说明。
S101、获取待处理图像;
S102、采用语义分割模型,对待处理图像进行语义分割处理,得到待处理图像的语义分割结果;其中,语义分割模型是以参考语义模型输出的第一中间特征进行轮廓分解或增强处理的第一变换特征为参考,结合待训练的语义分割模型输出的第二中间特征进行轮廓分解或增强处理的第二变换特征训练得到的。
第一中间特征和第二中间特征包括以下至少一组:第一纹理特征和第二纹理特征;第一语义特征和第二语义特征;第一变换特征和第二变换特征包括以下至少一组:第一轮廓特征和第二轮廓特征;第一增强特征和第二增强特征。
在本公开实施例中,语义分割装置对语义分割模型训练完成后,可以采用语义分割模型对获取的待处理图像进行语义分割,得到语义分割结果。其中,参考语义模型为预先训练的语义分割网络;待训练的语义分割模型为与参考语义模型功能一致的网络。
在本公开的一些实施例中,如图1b所示,在进行语义分割之前,需要先进行语义分割模型的确定。语义分割装置可以通过S01-S03实现训练过程,如下:
S01、采用参考语义模型和待训练的语义分割模型分别对图像样本进行特征提取,得到第一中间特征和第二中间特征;
S02、对第一中间特征和第二中间特征分别进行轮廓分解或增强处理,得到第一变换特征和第二变换特征;
S03、至少基于第一变换特征和第二变换特征,对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开实施例中,参考语义模型和待训练的语义分割模型的功能一致,均用于语义分割;参考语义模型为训练成功的复杂模型,语义分割模型为简单模型,采用参考语义模型指导语义分割模型进行训练,将参考语义模型学习到的知识迁移到语义分割模型中。
在本公开实施例中,在对待训练的语义分割模型进行训练的过程中,语义分割装置采用参考语义模型和待训练的语义分割模型对图像样本分别进行特征提取,可以得到第一中间特征和第二中间特征。
需要说明的是,本公开实施例中涉及的特征可以通过特征图体现,其中,特征图可以用C×H×M的矩阵表示;H×M表示特征图的像素,C表示特征图的通道数,也就是说,特征图可以看作是C维的深层描述子。
在本公开实施例中,参考语义模型和待训练的语义分割模型均包括多个卷积层,其中,通过多个卷积层可以依次得到对应的多个中间特征;多个中间特征包括低层特征和高层特征;其中,低层特征包含纹理信息,可以作为纹理特征;高层特征包含语义信息,可以作为语义特征。
在本公开实施例中,第一中间特征可以包括:第一纹理特征和第一语义特征中的至少一种;第一纹理特征为参考语义模型提取出的低层特征,第一语义特征为参考语义模型提取出的高层特征。
在本公开实施例中,第二中间特征可以包括:第二纹理特征和第二语义特征中的至少一种;第二纹理特征为待训练的语义分割模型提取出的低层特征,第二语义特征为语义分割模型提取出的高层特征。
在本公开实施例中,在第一中间特征包括第一纹理特征的情况下,第二中间特征包括第二纹理特征; 在第一中间特征包括第一语义特征的情况下,第二中间特征包括第二语义特征。
在本公开实施例中,参考语义模型和待训练的语义分割模型可以为:ResNet、ENet、ESPNet、BiSeNet、SegNet、ESPNet、RefineNet、ENet等,这里,参考语义模型和语义分割模型可以为相同的模型,也可以为不同的模型;对此,本公开实施例不作限制。
示例性的,参考图2,参考语义模型包括4个卷积层和一个解码器,通过参考语义模型对原始图像20进行语义分割,得到语义分割结果23。其中,前三个卷积层提取的特征为低层特征,如21所示,包含大量纹理信息,第4个卷积层提取的特征为高层特征,如22所示,包含语义信息。
在本公开实施例中,若第一中间特征包括第一纹理特征,第二中间特征包括第二纹理特征,则语义分割装置可以对第一中间特征进行轮廓分解,得到第一轮廓特征;以及对第二中间特征进行轮廓分解,得到第二轮廓特征;第一变换特征包括第一轮廓特征,第二变换特征包括第二轮廓特征。
在本公开实施例中,语义分割装置可以对第一纹理特征和第二纹理特征进行轮廓分解处理,将第一纹理特征分解为至少一个第一带通子带和第一低通子带,基于至少一个第一带通子带和第一低通子带,得到第一轮廓特征;以及将第二纹理特征分解为至少一个第二带通子带和第二低通子带,基于至少一个第二带通子带和第二低通子带,得到第二轮廓特征。
在本公开实施例中,语义分割装置可以对至少一个第一带通子带和第一低通子带进行融合处理,得到第一轮廓特征;语义分割装置可以对至少一个第二带通子带和第二低通子带进行融合处理,得到第二轮廓特征。
在本公开实施例中,语义分割装置可以包括低通滤波器和方向滤波器,通过低通滤波器和方向滤波器对第一纹理特征和第二纹理特征进行轮廓分解。
在本公开实施例中,语义分割装置可以通过拉普拉斯金字塔分解的方式对第一纹理特征和第二纹理特征进行至少一级分解。
在本公开实施例中,若第一中间特征包括第一语义特征,第二中间特征包括第二语义特征,则语义分割装置可以对第一中间特征进行增强处理,得到第一增强特征;以及,对第二中间特征进行增强处理,得到第二增强特征;第一变换特征包括第一增强特征,第二变换特征包括第二增强特征。
在本公开实施例中,语义分割装置可以对第一语义特征和第二语义特征进行增强处理,得到能够体现第一语义特征中像素的相关性的第一增强特征和能够体现第二语义特征中像素的相关性的第二增强特征。
在本公开的一些实施例中,语义分割装置可以预先训练一个注意力模型,通过注意力模型来实现增强处理;这里,注意力模型可以为共同注意力模型,也可以为多层次注意力模型,还可以为内在注意力模型等,对此,可以根据需要设置,本公开实施例不作限制。
在本公开的一些实施例中,语义分割装置也可以基于第一语义特征的特征矩阵,确定第一增强特征的特征矩阵;以及基于第二语义特征的特征矩阵,确定第二增强特征的特征矩阵。
在本公开实施例中,语义分割装置在得到第一中间特征和第二中间特征后,可以基于第一中间特征和第二中间特征对待训练的语义分割模型进行训练,训练成功后,得到语义分割模型。
在本公开的一些实施例中,语义分割装置可以确定第一中间特征和第二中间特征之间的特征损失;特征损失用于表征第一中间特征和第二中间特征之间的差异;语义分割装置可以根据特征损失,对待训练的语义分割模型进行训练,在特征损失小于特征损失阈值的情况下,停止训练,得到语义分割模型。
在本公开实施例中,在第一中间特征和第二中间特征包括第一轮廓特征和第二轮廓特征的情况下,特征损失包括第一损失;第一损失用于表征第一轮廓特征和第二轮廓特征之间的差异;在第一中间特征和第二中间特征包括第一语义特征和第二语义特征的情况下,特征损失包括第二损失;第二损失用于表征第一增强特征和第二增强特征之间的差异。
在本公开实施例中,特征之间的差异可以用向量距离来表征;这里,向量距离可以为余弦距离,也可以为欧式距离,对此,本公开实施例不作限制。
在本公开的一些实施例中,语义分割装置可以基于第三损失、第四损失、响应损失和训练损失中的至少一个,以及特征损失,对待训练的语义分割模型进行训练。其中,响应损失表征第一语义分割结果和第二语义分割结果之间的差异;第一语义分割结果和第二语义分割结果分别为参考语义模型和待训练的语义分割模型对图像样本进行语义分割得到的结果;第三损失表征第一语义分割特征和第二语义分割特征之间的差异;第一语义分割特征和第二语义分割特征分别为参考语义模型和待训练的语义分割模型的池化层提取的特征;池化层为在高层卷积层之后的特征提取层;第四损失用于表征第一关系特征和第二关系特征之间的损失,其中第一关系特征表征第一中间特征和第一语义分割特征中,多个特征的关系,第二关系特征表征第二中间特征和第二语义分割特征中,多个特征的关系;多个特征的关系可以通过向量相似度表征;训练损失表征第二语义分割结果与图像样本之间的差异。
在本公开的一些实施例中,语义分割装置可以分别设置响应损失阈值、第三损失阈值、第四损失阈值和训练损失阈值;如此,语义分割装置可以在第三损失、第四损失、响应损失和训练损失中的至少一个以及特征损失均小于对应的损失阈值的情况下,停止训练,得到语义分割模型。
在本公开的一些实施例中,语义分割装置可以对第三损失、第四损失、响应损失和训练损失中的至少一个,以及特征损失进行加权求和,得到语义损失;在语义损失小于语义损失阈值的情况下,停止训练,得到语义分割模型。
可以理解的是,语义分割装置可以提取参考语义模型的第一中间特征,以及语义分割模型的第二中间特征;对第一中间特征和第二中间特征进行轮廓分解或增强处理,得到第一变换特征和第二变换特征,第一变换特征和第二变换特征包括纹理知识和语义知识中的至少一种,如此,语义分割装置基于第一变换特征和第二变换特征对待训练的语义分割装置进行训练,得到的语义分割模型,可以学习到参考语义模型中的纹理知识和语义知识,从而提高了采用语义分割模型对待处理图像进行语义分割时的精度。
在本公开的一些实施例中,S01中采用参考语义模型和待训练的语义分割模型分别对图像样本进行特征提取,得到第一中间特征和第二中间特征的实现,可以包括:
S201、采用参考语义模型和待训练的语义分割模型分别对图像样本进行特征提取,得到第一纹理特征和第二纹理特征。
S202、对第一纹理特征和第二纹理特征进行特征提取,得到第一语义特征和第二语义特征。
在本公开实施例中,语义分割装置通过参考语义模型提取了第一纹理特征后,可以继续对第一纹理特征进行特征提取,得到第一语义特征;通过待训练的语义分割模型提取了第二纹理特征之后,可以继续对第二纹理特征进行特征提取,得到第二语义特征。
在本公开实施例中,参考语义模型和待训练的语义分割模型包括多层卷积层,多层卷积层可以得到多个中间特征,其中,第一层卷积层对图像样本进行特征提取得到第一层中间特征,第二层卷积层对第一层中间特征进行特征提取,得到第二层中间特征,以此类推,得到多个中间特征。
在本公开实施例中,多层卷积层可以包括至少一个低层卷积层和一个高层卷积层;通过至少一个低层卷积层得到的至少一个中间特征为低层特征,即第一纹理特征,通过一个高层卷积层得到的一个中间特征为高层特征,即第一语义特征。也就是说,语义分割模型在获取第一语义特征之前,需要先获取第一纹理特征。
可以理解的是,语义分割装置通过参考语义模型和待训练的语义分割模型中的多层卷积层,对图像样本进行特征提取,可以依次得到第一纹理特征和第一语义特征。
在本公开的一些实施例中,S02中对第一中间特征和第二中间特征分别进行轮廓分解或增强处理,得到第一变换特征和第二变换特征的实现,可以包括以下至少一个:对第一纹理特征和第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征;对第一语义特征和第二语义特征进行增强处理,得到第一增强特征和第二增强特征。
在本公开实施例中,在第一中间特征包括第一纹理特征的情况下,第二中间特征包括第二纹理特征;语义分割装置可以对第一纹理特征和第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征,将第一轮廓特征作为第一变换特征,第二轮廓特征作为第二变换特征;在第一中间特征包括第一语义特征的情况下,第二中间特征包括第二语义特征;语义分割装置可以对第一语义特征和第二语义特征分别进行增强处理,得到第一增强特征和第二增强特征;在第一中间特征包括第一纹理特征和第一语义特征的情况下,第二中间特征包括第二纹理特征和第二语义特征;语义分割装置可以对第一纹理特征和第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征;以及对第一语义特征和第二语义特征分别进行增强处理,得到第一增强特征和第二增强特征。
在本公开实施例中,语义分割装置对第一纹理特征和第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征的实现可以包括:S301-S302。
S301、对第一纹理特征进行轮廓分解处理,得到第一轮廓特征。
在本公开实施例中,语义分割装置可以通过轮廓波分解部分(Contourlet Decomposition Module,CDM)对第一纹理特征进行轮廓分解,得到第一轮廓特征。
在本公开实施例中,轮廓波分解部分包括至少一组低通滤波器(Low-pass FIilter)和方向滤波器(Drectional Filter Bank,DFB)的组合;其中,低通滤波器用于对输入的特征进行滤波,将输入的特征分解为高通子带和低通子带;方向滤波器用于对高通子带进行方向滤波,得到方向子带;如此,每一组低通滤波器和方向滤波器的组合可以实现一次拉普拉斯金字塔分解。
在本公开的一些实施例中,基于隔行扫描因子,对第一纹理特征和第二纹理特征均进行滤波处理后,得到各自的高通子带和低通子带;对高通子带进行方向滤波,得到方向子带;分别对第一纹理特征和第二纹理特征对应的低通子带和方向子带进行融合,得到第一轮廓特征和第二轮廓特征,从而完成了轮廓 分解处理。
需要说明的是,在本公开实施例中,每组LP和DFB可以进行一级轮廓分解,本公开实施例中不限制轮廓分解中的级别限制。
示例性的,在本公开实施例中,S301中对第一纹理特征进行轮廓分解处理,得到第一轮廓特征的实现,可以包括:S3011-S3012。
S3011、通过至少一级LP和DFB组合,对第一纹理特征进行至少一级轮廓分解,得到至少一个方向子带和一个低通子带。
在本公开实施例中,语义分割装置通过CDM对第一纹理特征进行至少一级分解后,可以得到至少一个方向子带和一个低通子带;其中,至少一个方向子带为对应的至少一级分解得到的方向子带,一个低通子带为最后一级分解得到的低通子带。CDM对第一纹理特征进行分解的方式,参见公式(1)。
Figure PCTCN2021125073-appb-000001
其中,↓为采样运算符,p为隔行扫描因子;F l,n表示第n级低通子带(特征),即第n级分解得到的低通子带(特征);n∈[1,m],m为轮廓波分解部分中LP和DFB组合的组数。从公式(1)中可以看出,通过LP对下采样处理后的F l,n进行分解,可以得到第n+1级低通子带特征F l,n+1和第n+1级高通子带特征F h,n+1;再通过DFB对第n+1级低通子带特征F l,n+1进行方向滤波,可以得到第n+1级方向子带特征F bds,n+1
需要说明的是,第1次分解是对第一纹理特征通过LP进行分解,得到第1级低通子带特征F l,1和第1级高通子带特征F h,1
在本公开实施例中,DFB包括k级二叉树,通过k级二叉树对输入的特征进行分解,得到的方向子带特征包括2 k个方向子带。例如,k=3,则方向子带特征为8个:0,1,……7,其中,0-3为垂直方向特征和4-7为水平方向特征。
在本公开实施例中,第一纹理特征中的中间特征的数量和轮廓分解部分中CDM的数量相同;也就是说,第一纹理特征中的每个中间特征需要使用一个CDM。
示例性的,参考图3,CDM包括2组低通滤波器LP和方向滤波器DFB的组合,第1组包括LP 1和DFB 1,第2组包括LP 2和DFB 2;将纹理特征F输入CDM后,可以得到一个低通子带F l,2,两个方向子带F bds,1和F bds,2。其中,通过LP 1对纹理特征F进行低通滤波后,可以得到第1级高通子带F h,1和第1级低通子带F l,1;通过DFB 1对第1级高通子带F h,1进行方向滤波,得到第1级方向子带F bds,1;按照(2,2)对第1级低通子带F l,1进行下采样,得到下采样后的第1级低通子带F l,1-J的长和宽均为第1级低通子带F l,1的1/2。通过LP 2对下采样后的第1级低通子带F l,1-J进行低通滤波,得到第2级高通子带F h,2和第2级低通子带F l,2;通过DFB 2对第2级高通子带F h,2进行方向滤波,得到第2级方向子带F bds,2。DFB 1包括4级二叉树,F bds,1包括16个方向子带;DFB 2包括3级二叉树,F bds,1包括8个方向子带。
需要说明的是,语义分割装置通过LP对第n级低通子带进行分解实现,可以包括:通过低通分析滤波器对第n级低通子带进行低通分析滤波,得到第n+1级低通结果;之后,对第n+1级低通结果进行下采样,得到第n+1级低通子带;再对第n+1级低通子带进行上采样,将上采样后的第n+1级低通子带通过合成滤波器,得到第n+1级低通结果,基于第n级低通子带和第n+1级低通结果,得到第n+1级高通子带。
在本公开的一些实施例中,语义分割装置可以利用第n级低通子带减去第n+1级低通结果,得到第n+1级高通子带。
在本公开实施例中,语义分割装置可以按照元素,对第n级低通子带和第n+1级低通结果求差值,得到第n+1级高通子带。
示例性的,参考图4,低通滤波器包括低通分析滤波器41、下采样部分42、上采样部分43、合成滤波器44和减法部分45;其中,将第n级低通子带F l,n输入低通分析滤波器41,可以得到第n+1级低通结果F l-n+1;通过下采样部分42对第n+1级低通结果进行下采样处理后,得到下采样后的第n+1级低通结果F l,n+1,作为第n+1级低通子带;通过上采样部分43对下采样后的第n+1级低通结果进行上采样,得到第n+1级低通结果F l-n+1;将第n+1级低通结果F l-n+1输入到合成滤波器44中,得到合成滤波后的 第n+1级低通结果F l-n+1,最后,通过减法部分45,将第n级低通子带F l,n减去合成滤波后的第n+1级低通结果F l-n+1,可以得到第n+1级高通子带F h,n+1
S3012、将至少一个方向子带和一个低通子带进行特征融合,得到第一轮廓特征。
在本公开实施例中,语义分割装置对第一纹理特征进行至少一次分解后,可以得到至少一级方向子带和最后一级低通子带;其中,至少一级方向子带和最后一级低通子带的特征维度不同,需要语义分割装置通过池化层将至少一级方向子带和最后一级低通子带的维度变换一致,得到至少一个变换方向子带和最后一级变换低通子带,再对至少一个变换方向子带和最后一级变换低通子带进行第一融合处理,得到第一轮廓特征F te
这里,第一融合处理可以包括:对至少一个变换方向子带和最后一级变换低通子带进行相加,或者,对至少一个变换方向子带和最后一级变换低通子带进行拼接等,对此,本公开实施例不作限制。
需要说明的是,CDM分解的级数越大,提取的第一轮廓特征越丰富,训练后的学生网络达到的精度也越高,但计算量越高。这里,CDM分解的级数可以根据需要设置。
S302、对第二纹理特征进行轮廓分解处理,得到第二轮廓特征。
在本公开实施例中,语义分割装置对第二纹理特征进行轮廓分解处理的方式,与S301中对第一纹理特征进行轮廓波分解处理的方式相同,详见S301中的说明,在此,不再赘述。
在本公开实施例中,语义分割装置对第一语义特征和第二语义特征进行增强处理,得到第一增强特征和第二增强特征的方式,可以包括:S401-S402。
S401、对第一语义特征进行增强处理,得到第一增强特征。
在本公开实施例中,语义分割装置在得到第一语义特征之后,可以通过语义注意力部分(Semantic Attention Module,SAM)对第一语义特征进行增强处理,得到第一增强特征。
在本公开实施例中,对第一语义特征和第二语义特征进行增强处理,得到第一增强特征和第二增强特征的实现过程可以包括:对第一语义特征和第二语义特征均进行至少两种转换,得到各自对应的至少两种语义变换特征;对至少两种语义变换特征中不同的语义变换特征进行自增强处理,得到相关矩阵;将相关矩阵与至少两种语义变换特征中的一个语义变换特征进行增强处理,得到自增强特征;基于第一语义特征和第二语义特征各自对应的自增强矩阵,将各自的自增强特征确定为第一增强特征和第二增强特征;或者,将第一语义特征和第二语义特征,分别与各自的自增强特征进行融合,得到第一增强特征和第二增强特征。
示例性的,以至少两种语义变换特征为三种为例进行说明。在本公开实施例中,S401中对第一语义特征进行增强处理,得到第一增强特征的实现,可以包括:S4011-S4014。
S4011、将第一语义特征进行三种转换,得到第一语义变换特征、第二语义变换特征和第三语义变换特征。
在本公开实施例中,语义分割装置可以将第一语义特征分别进行第一变换、第二变换和第三变换,得到第一语义变换特征、第二语义变换特征、第三语义变换特征;其中,第一语义变换特征包括的向量的数目等于通道数C;第二语义变换特征包括的向量的数目等于像素数(H×M)。
在本公开的一些实施例中,第一语义变换特征和第二语义变换特征互为转置矩阵。
在本公开的一些实施例中,第一语义变换特征和第三语义变换特征是相同的矩阵特征。
S4012、将第一语义变换特征和第二语义变换特征进行矩阵相乘,得到相关特征;相关特征的矩阵中的元素用于表征像素的相关性系数。
在本公开实施例中,通过矩阵乘法对第一语义变换特征和第二语义变换矩特征相乘,得到的矩阵为相关特征;相关特征的矩阵中的元素可以表征像素之间的相关性;相关性越大,元素值越大;相关性越小,元素值越小。
S4013、将相关特征和第三语义变换特征相乘,得到自增强特征。
在本公开实施例中,相关特征和第三语义变换特征相乘后,得到的矩阵为自增强特征;即通过相关特征对第三语义变换特征进行增强,使自增强矩阵中包含像素的相关性。
S4014、基于自增强特征,确定第一增强特征。
在本公开实施例中,语义分割装置在得到自增强特征之后,可以根据自增强特征确定第一增强特征。
在本公开的一些实施例中,语义分割装置可以将自增强特征作为第一增强矩阵。
示例性的,基于图5a,第一语义特征矩阵为H×W×C矩阵MF,对第一语义特征矩阵进行3类变换后,可以得到C×(H×W)的第一语义变换矩阵MF1;(H×W)×C的第二语义变换矩阵MF2和第三语义变换矩阵MF3;如此,第一语义变换矩阵MF1和第二语义变换矩阵MF2相乘可以得到C×C的相关矩阵MFC,相关矩阵MFC与第三语义变换矩阵MF3相乘后,得到H×W×C的自增强矩阵MFp1;如此,自增强矩阵MFp1包含了元素之间相关关系,语义分割装置可以将MFp1作为第一增强特征矩阵。
在本公开的一些实施例中,语义分割装置可以将自增强特征和第一语义特征进行第二融合处理,得到第一增强特征。
在本公开实施例中,第二融合处理可以包括:对自增强特征和第一语义特征进行矩阵相加,或者,对自增强特征和第一语义特征进行加权相加;加权的权值可以根据需要设置,对此,本公开实施例不作限制。
示例性的,基于图5a,参考图5b,在得到H×W×C的自增强矩阵MFp1后,语义分割装置将自增强矩阵按照权值γ进行加权后,与第一语义特征矩阵MF按照元素相加进行处理,得到第一增强特征矩阵MFp2。
S402、对第二语义特征进行增强处理,得到第二增强特征。
在本公开实施例中,语义分割装置对第二语义特征进行增强处理的方式,与S401中对第一语义特征进行增强处理的方式和原理相同,详见S401中的说明,在此,不再赘述。
在本公开的一些实施例中,S03中至少基于第一变换特征和第二变换特征,对待训练的语义分割模型进行训练,确定出语义分割模型的实现,可以包括:S501-S503。
S501、基于预设第一损失函数、第一轮廓特征和第二轮廓特征进行损失计算,确定第一损失;
在本公开实施例中,预设第一损失函数可以为均方差均值函数,语义分割装置可以对第一轮廓特征和第二轮廓特征,计算第一方差均值,将第一方差均值作为第一损失;通过第一损失表征第一轮廓特征和第二轮廓特征的差异。参考公式(2)
Figure PCTCN2021125073-appb-000002
其中,L te(S)表示第一损失;F i te;T表示第一轮廓特征中第i个元素的轮廓特征,F i te;S表示第二轮廓特征中第i个像素对应的轮廓特征,i∈R=H×W。
在本公开实施例中,语义分割装置可以对第一轮廓特征中第i个像素对应的轮廓特征与第二轮廓特征中的第i个像素对应的轮廓特征计算方差,得到R个第一方差;在对R个第一方差求和,得到第一方差和之后,将第一方差和除以像素总数,得到第一方差均值。
S502、基于预设第二损失函数、第一增强特征和第二增强特征,确定第二损失;
在本公开实施例中,预设第二损失函数可以为均方差函数,语义分割装置可以对第一增强特征和第二增强特征,计算第二方差均值,将第二方差均值作为第二损失;通过第二损失表征第一增强特征和第二增强特征的差异。参考公式(3)。
Figure PCTCN2021125073-appb-000003
其中,L se(S)表示第二损失;F i se;T表示第一语义特征中第i个元素的语义特征,F i se;S表示第二语义特征中第i个元素的语义特征,i∈R=H×W。
在本公开实施例中,语义分割装置可以对第一语义特征中第i个像素对应的语义特征与第二语义特征中的第i个像素对应的语义特征计算方差,得到R个第二方差;在对R个第二方差求和,得到第二方差和之后,将第二方差和除以像素总数,得到第二方差均值。
S503、基于第一损失和第二损失中的至少一个,对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开实施例中,语义分割装置在确定第一损失和第二损失后,可以根据第一损失和第二损失中的至少一个对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开的一些实施例中,语义分割装置可以根据第一损失对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开实施例中,语义分割装置可以在第一损失小于第一损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型。
在本公开的一些实施例中,语义分割装置可以根据第二损失对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开实施例中,语义分割装置可以在第二损失小于第二损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型。
在本公开的一些实施例中,语义分割装置可以根据第一损失和第二损失对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开实施例中,语义分割装置可以在第一损失小于第一损失阈值,且第二损失小于第二损失阈 值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型;也可以对第一损失和第二损失进行加权求和,得到第一语义损失,在第一语义损失小于第一语义损失阈值的情况下,确定出语义分割模型。
参考图6,a为两张待处理图像,b为没有学习纹理知识的语义分割模型对两张待处理图像进行特征提取后,得到的特征图;c为学习了纹理知识的语义分割模型对两张待处理图像进行特征提取,得到的特征图。从图6中可以看出,语义分割模型在学习了纹理知识之后,特征图中包含丰富的纹理知识,轮廓更加清晰。
在本公开的一些实施例中,S03中至少基于第一变换特征和第二变换特征,对待训练的语义分割模型进行训练,确定出语义分割模型的实现,还可以包括:S601-S603。
S601、对第一增强特征和第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征;
在本公开实施例中,参考语义模型和待训练的语义分割模型中包括池化层,池化层在最后一层卷积层之后;参考语义模型在得到第一增强特征之后,可以通过池化层对第一增强特征进行语义分割预测,得到第一语义分割特征;待训练的语义分割模型在得到第二增强特征之后,可以对第二增强特征进行语义分割预测,得到第二语义分割特征。
S602、基于预设第三损失函数、第一语义分割特征和第二语义分割特征进行损失计算,确定第三损失;
在本公开实施例中,预设第三损失函数可以为均方差函数,语义分割装置可以对第一语义分割特征和第二语义分割特征,计算第三方差均值,将第三方差均值作为第三损失;通过第三损失表征第一语义分割特征和第二语义分割特征的差异。
示例性的,可以参考公式(4)。
Figure PCTCN2021125073-appb-000004
其中,L see(S)表示第三损失;F i see;T表示第一语义分割特征中第i个像素对应的语义分割特征,F i see;S表示第二语义分割特征中第i个像素对应的语义分割特征,i∈R=H×W。
在本公开实施例中,语义分割装置可以对第一语义分割特征中第i个元素的语义特征与第二语义分割特征中的第i个像素对应的语义分割特征计算方差,得到R个第三方差;在对R个第三方差求和,得到第三方差和之后,将第三方差和除以像素总数,得到第三方差均值。
S603、基于第三损失对待训练的语义分割模型进行训练,确定出语义分割模型;或者,基于第一损失、第二损失中的至少一个,以及第三损失,对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开的一些实施例中,语义分割装置在确定第三损失后,可以根据第三损失对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开实施例中,语义分割装置可以在第三损失小于第三损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型。
在本公开实施例中,语义分割装置可以在第一损失小于第一损失阈值,且第三损失小于第三损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型;也可以对第一损失和第三损失进行加权求和,得到第二语义损失,在第二语义损失小于第二语义损失阈值的情况下,确定出语义分割模型。
在本公开实施例中,语义分割装置可以在第二损失小于第一损失阈值,且第三损失小于第三损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型;也可以对第二损失和第三损失进行加权求和,得到第三语义损失,在第三语义损失小于第三语义损失阈值的情况下,确定出语义分割模型。
在本公开的一些实施例中,第一纹理特征包括:至少一个第一子纹理特征;第二纹理特征包括:至少一个第二子纹理特征;S601中对第一增强特征和第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征之后的实现,还可以包括:S701-S704。
S701、基于第一语义分割特征、第一语义特征、至少一个第一子纹理特征,确定第一图推理关系。
在本公开实施例中,语义分割装置得到第一语义分割特征、第一语义特征、和至少一个第一子纹理特征后,可以基于第一语义分割特征、第一语义特征、至少一个第一子纹理特征进行第一图推理,得到第一图推理关系。
在本公开的一些实施例中,基于第一语义分割特征、第一语义特征、至少一个第一子纹理特征,确 定第一图推理关系,包括:基于输出顺序,确定第一语义分割特征、第一语义特征、至少一个第一子纹理特征之间的至少两个差异特征;对至少两个差异特征,进行相关处理,得到差异特征之间的相关度;基于至少两个差异特征和差异特征之间的相关度,构成第一图推理关系。
在本公开实施例中,至少一个第一子纹理特征对应参考语义模型中的至少一个低层卷积层得到的至少一个中间特征;语义分割装置可以将第一语义分割特征、第一语义特征、至少一个第一子纹理特征,按照卷积层和池化层从后到前的顺序(即特征的输出顺序,从后往前),确定相邻两层之间的特征变化情况,得到多个第一关系特征(即至少两个差异特征)。
在本公开实施例中,语义分割装置在确定多个第一关系特征之后,可以将多个第一关系特征作为多个第一节点、按照多个第一关系特征之间的相关性(即相关度),对多个第一节点进行连边,构造第一关系图G T,参考公式(5);通过第一关系图表征第一图推理关系。
G T=(ν TT)=(F i va,T,A ij T)    公式(5)
其中,G T表示第一关系图;ν T表示第一关系图中的节点,ε T表示第一关系图总中的连边;F i va,T表示N个第一关系特征中的第i个,A ij T表示F i va,T和F j va,T之间的连边;N表示第一关系特征的数量;i,j∈[1,N-1],且,i≠j。
在本公开实施例中,F i va,T可以通过参考语义模型中第i+1层特征
Figure PCTCN2021125073-appb-000005
和第i层特征F i T的之间的相似度表征,参考公式(6)。
Figure PCTCN2021125073-appb-000006
在本公开的一些实施例中,基于至少两个差异特征和差异特征之间的相关度,构成第一图推理关系,在每个特征边都可以连边的情况下,示例性的,A ij可以通过公式(7-1)得到:
Figure PCTCN2021125073-appb-000007
其中,f si表示向量之间的相似度。
在本公开的一些实施例中,在差异特征之间的相关度中存在小于等于预设关联阈值的目标差异之间的关联特征情况下,基于目标差异之间的关联特征和至少两个差异特征,构成第一图推理关系。也即是说,在满足条件的情况下,才会有部分特征的边可以连的情况下,示例性的,A ij可以通过公式(7-2)得到:
Figure PCTCN2021125073-appb-000008
其中,f si表示向量之间的相似度,
Figure PCTCN2021125073-appb-000009
为指示函数,μ为相似度阈值。
从公式(7)中可以看出,语义分割装置可以对相似度高的节点之间进行连边;在μ=0的情况下,任意两个节点之间均可进行连边。
S702、基于第二语义分割特征、第二语义特征、至少一个第二子纹理特征,确定第二图推理关系。
在本公开实施例中,语义分割装置得到第二语义分割特征、第二语义特征、和至少一个第二子纹理特征后,可以基于第二语义分割特征、第二语义特征、至少一个第二子纹理特征进行第二图推理,得到第二图推理关系。
在本公开实施例中,至少一个第二子纹理特征对应待训练的语义分割模型中的至少一个低层卷积层得到的至少一个中间特征;语义分割装置可以将第二语义分割特征、第二语义特征、至少一个第二子纹理特征,按照卷积层和池化层从后到前的顺序,确定相邻两层之间的特征变化情况,得到多个第二关系特征。
在本公开实施例中,语义分割装置在确定多个第二关系特征之后,可以将多个第二关系特征作为多个第二节点、按照第一关系图中的连边方式进行连边,构造第二关系图G S,参考公式(5);通过第二关系图表征第二图推理关系。
S703、基于预设第四损失函数、第一图推理关系和第二图推理关系进行损失计算,确定第四损失。
在本公开实施例中,第一图推理关系包括节点和连边;第二图推理关系也包括节点和连边;预设第四损失函数用于表征第一关系图和第二关系图之间的向量距离,作为第四损失,通过第四损失表征第一关系图和第二关系图之间的差异,参见公式(8)。
L va(S)=Dist(G T,G S)    公式(8)
其中,L va(S)表示第四损失,G T表示第一关系图,G S表示第二关系图。Dist表示向量距离。
在本公开实施例中,第一关系图G T中包括第一节点ν T和第一连边ε T;第二关系图G S中包括第二节点ν S和第二连边ε S;如此,语义分割装置可以先确定第一节点ν T和第二节点ν S之间的节点向量距离,以及第一连边ε T和第二连边ε S之间的连边向量距离,再对节点向量距离和连边向量距离进行加权求和,得到第四损失,参考公式(9)。
L va(S)=Dist(ν TS)+λDist(ε TS)    公式(9)
其中,λ为加权的权值,可以根据需要设置,本公开实施例不作限制。
在本公开实施例中,公式(9)还可以表示为公式(10):
Figure PCTCN2021125073-appb-000010
S704、基于第四损失对待训练的语义分割模型进行训练,确定出语义分割模型;或者,基于第一损失、第二损失、第三损失中的至少一个,以及第四损失对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开实施例中,语义分割装置在确定第四损失后,可以根据第四损失对待训练的语义分割模型进行训练,确定出语义分割模型。
在本公开的一些实施例中,语义分割装置可以在第四损失小于第四损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型。
在本公开的一些实施例中,语义分割装置可以在第四损失小于第四损失阈值,且第一损失小于第一损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型;或者,对第四损失和第一损失进行加权求和,得到第四语义损失,在第四语义损失小于第四语义损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型。
在本公开的一些实施例中,语义分割装置可以在第四损失小于第四损失阈值,且第二损失小于第二损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型;或者,对第四损失和第二损失进行加权求和,得到第五语义损失,在第五语义损失小于第五语义损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型。
在本公开的一些实施例中,语义分割装置可以在第四损失小于第四损失阈值,且第三损失小于第三损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型;或者,对第四损失和第三损失进行加权求和,得到第六语义损失,在第六语义损失小于第六语义损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型。
在本公开的一些实施例中,语义分割装置可以在第四损失小于第四损失阈值,第三损失小于第三损失阈值,且第二损失小于第二损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型;或者,对第四损失、第三损失、第二损失和第一损失进行加权求和,得到第七语义损失,在第七语义损失小于第七语义损失阈值的情况下,停止对待训练的语义分割模型的训练,得到语义分割模型。
在本公开的一些实施例中,语义分割装置基于第一语义分割结果和第二语义分割结果,可以确定响应损失;根据第一损失、第二损失、第三损失、第四损失和响应损失,对待训练的语义分割模型进行训练,得到语义分割模型。
在本公开实施例中,响应损失L r(S)可以根据公式(11)得到:
Figure PCTCN2021125073-appb-000011
其中,F i r;T为第一语义分割结果中第i个像素对应的特征;F i r;S为第二语义分割结果中第i个像素对应的特征。
在本公开的一些实施例中,语义分割装置基于第二语义分割结果和图像样本,可以确定训练损失; 根据第一损失、第二损失、第三损失、第四损失、响应损失和训练损失,对待训练的语义分割模型进行训练,得到语义分割模型。
在本公开实施例中,响应损失L sa(S)可以根据公式(12)得到:
Figure PCTCN2021125073-appb-000012
其中,F i sa为图像样本中第i个像素对应的特征。
示例性的,参考图7,本公开实施例提供一种知识蒸馏示意图,如图7所示,教师网络和学生网络均包括4个卷积层和一个池化层,池化层通过金字塔池化部分(Pyramid Pooling Module,PPM)实现。其中,前3个卷积层为低层卷积层,第4个卷积层为高层卷积层,高级卷积层连接一个SAM;教师网络通过4个卷积层和1个池化层,依次提取出3个子纹理特征、1个第一语义特征和1个第一语义分割特征;通过3个CDM对3个纹理特征进行轮廓分解,得到3个第一子轮廓特征,基于注意力机制,通过SAM对第一语义特征进行增强处理,得到1个第一增强特征,基于第一语义分割特征,得到第一语义分割结果;同理,学生网络通过4个卷积层、3个CDM和1个SAM,可以得到3个第二子纹理特征、3个第二子轮廓特征、1个第二语义特征、1个第二增强特征、1个第二语义分割特征和第二语义分割结果;如此,学生网络可以基于3个第一子轮廓特征,3个第二子轮廓特征,学习教师网络的纹理知识,基于1个第一增强特征、1个第二增强特征、1个第一语义分割特征和1个第二语义分割特征,学习教师网络的语义知识;纹理知识和语义知识作为特征知识;以及,基于3个第一子纹理特征、1个第一语义特征和1个第一语义分割特征之间的第一关系特征,以及,3个第二子纹理特征、1个第二语义特征和1个第二语义分割特征之间的第二关系特征,学习教师网络的关系知识;以及,基于第一语义分割结果和第二语义分割结果,学习教师网络的响应知识。如此,学生网络可以从教师网络学习到丰富的知识,提高了学生网络的语义分割精度。
参考图8,图8示出了学生网络的语义分割结果示意图,如图8所示,a为城市场景下的原始图像,b为相关技术中的学生网络的语义分割结果,c为本案的学生网络的语义分割结果,d为针对a中原始图像的语义分割的图像样本;可以看出,本案的学生网络的语义分割结果包含的信息更丰富,更接近图像样本。
示例性的,图7中的知识蒸馏方法应用于城市场景中,表1示出了城市场景下学生网络和教师网络的平均交并比的对比结果。如表1所示,学生网络本身平均交并比最低,采用了结构知识蒸馏(stucture Knowledge Distillation,SKD)后,平均交并比有改善,采用类特征变化蒸馏(Intra-class Feature Variation Distillation,IFKD)后,平均交并比进一步改善,而采用本案的方法的平均交并比最高。
Figure PCTCN2021125073-appb-000013
Figure PCTCN2021125073-appb-000014
表1
以学生网络为ResNet18为例说明,ResNet18本身采用val测试集的平均交并比为69.1,与教师网络相差9.46%,采用test测试集的平均交并比为67.6,与教师网络相差9.18%;本案的方法采用val测试集的平均交并比为75.82,相比ResNet18提高了6.72%,采用test测试集的平均交并比为73.78,相比ResNet18提高了6.18%,本案的方法最接近教师网络的平均交并比;其中,val是训练过程中使用的测试集,val用于根据训练结果及时判断学习状态。test是训练模型结束后,用于评价模型结果的测试集。从表1可以看出,采用本案的方法训练的学生网络的精度得到了显著提高,其精度最接近教师网络。
本公开实施例还提供一种语义分割装置,图9为本公开实施例提供的语义分割装置的一个可选的组成结构示意图,如9所示,该语义分割装置20包括:
特征获取部分2000,被配置为获取待处理图像;
语义分割部分2004,被配置为采用所述语义分割模型,对待处理图像进行语义分割处理,得到所述待处理图像的语义分割结果;所述语义分割模型是以参考语义模型输出的第一中间特征进行轮廓分解或增强处理的第一变换特征为参考,结合待训练的语义分割模型输出的第二中间特征进行轮廓分解或增强处理的第二变换特征训练得到的;
所述第一中间特征和所述第二中间特征包括以下至少一组:
第一纹理特征和第二纹理特征;
第一语义特征和第二语义特征;
所述第一变换特征和所述第二变换特征包括以下至少一组:
第一轮廓特征和第二轮廓特征;
第一增强特征和第二增强特征。
在一些实施例中,所述语义分割装置20还包括:
特征提取部分2001,被配置为采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到所述第一中间特征和所述第二中间特征;所述参考语义模型为预先训练的语义分割网络;所述待训练的语义分割模型为与所述参考语义模型功能一致的网络;
特征处理部分2002,被配置为对所述第一中间特征和所述第二中间特征分别进行轮廓分解或增强处理,得到所述第一变换特征和所述第二变换特征;
训练部分2003,被配置为至少基于所述第一变换特征和所述第二变换特征,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
在一些实施例中,所述特征提取部分2001,还被配置为采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到第一纹理特征和第二纹理特征;对所述第一纹理特征和第二纹理特征进行特征提取,得到第一语义特征和第二语义特征。
在一些实施例中,所述特征处理部分2002,还被配置为对所述第一纹理特征和所述第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征;或者,对所述第一语义特征和所述第二语义特征进行增强处理,得到第一增强特征和第二增强特征。
在一些实施例中,所述特征处理部分2002,还被配置为对所述第一纹理特征和所述第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征;以及对所述第一语义特征和所述第二语义特征进行增强处理,得到第一增强特征和第二增强特征。
在一些实施例中,所述训练部分2003,还被配置为基于预设第一损失函数、所述第一轮廓特征和所述第二轮廓特征进行损失计算,确定第一损失;基于预设第二损失函数、所述第一增强特征和所述第二增强特征,确定第二损失;基于所述第一损失和所述第二损失中的至少一个对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
在一些实施例中,所述训练部分2003,还被配置为对所述第一增强特征和所述第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征;基于预设第三损失函数、所述第一语义分割特征和所述第二语义分割特征进行损失计算,确定第三损失;基于所述第三损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型;或者,基于第一损失、第二损失中的至少一个,以及所述第三损失,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
在一些实施例中,所述第一纹理特征包括:至少一个第一子纹理特征;所述第二纹理特征包括:至少一个第二子纹理特征;所述训练部分2003,还被配置为对所述第一增强特征和所述第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征之后,基于所述第一语义分割特征、 第一语义特征、所述至少一个第一子纹理特征,确定第一图推理关系;基于所述第二语义分割特征、第二语义特征、所述至少一个第二子纹理特征,确定第二图推理关系;基于预设第四损失函数、所述第一图推理关系和所述第二图推理关系进行损失计算,确定第四损失;基于所述第四损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型;或者,基于第一损失、第二损失、第三损失中的至少一个,以及所述第四损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
在一些实施例中,所述训练部分2003,还被配置为基于输出顺序,确定所述第一语义分割特征、所述第一语义特征、所述至少一个第一子纹理特征之间的至少两个差异特征;对所述至少两个差异特征,进行相关处理,得到差异特征之间的相关度;基于所述至少两个差异特征和所述差异特征之间的相关度,构成所述第一图推理关系。
在一些实施例中,所述训练部分2003,还被配置为在所述差异特征之间的相关度中存在小于等于预设关联阈值的目标差异之间的关联特征情况下,基于所述目标差异之间的关联特征和所述至少两个差异特征,构成所述第一图推理关系。
在一些实施例中,所述特征处理部分2002,还被配置为基于隔行扫描因子,对所述第一纹理特征和所述第二纹理特征均进行滤波处理后,得到各自的高通子带和低通子带;对所述高通子带进行方向滤波,得到方向子带;分别对所述第一纹理特征和所述第二纹理特征对应的所述低通子带和所述方向子带进行融合,得到所述第一轮廓特征和所述第二轮廓特征,完成轮廓分解处理。
在一些实施例中,所述特征处理部分2002,还被配置为对所述第一语义特征和所述第二语义特征均进行至少两种转换,得到各自对应的至少两种语义变换特征;对所述至少两种语义变换特征中不同的语义变换特征进行自增强处理,得到相关矩阵;将所述相关矩阵与所述至少两种语义变换特征中的一个语义变换特征进行增强处理,得到自增强特征;基于所述第一语义特征和所述第二语义特征各自对应的自增强矩阵,将各自的自增强特征确定为所述第一增强特征和所述第二增强特征;或者,将所述第一语义特征和所述第二语义特征,分别与各自的自增强特征进行融合,得到所述第一增强特征和所述第二增强特征。
在本申请实施例以及其他的实施例中,“部分”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是单元,还可以是模块也可以是非模块化的。
本公开实施例还提供一种电子设备,图10为本公开实施例提供的电子设备的一个可选的组成结构示意图,如图10所示,该电子设备21包括:处理器2101和存储器2102,存储器2102存储有可在处理器2101上运行的计算机程序,处理器2101执行所述计算机程序被执行时,实现本公开实施例的任意一种语义分割方法的步骤;处理器2101和存储器2102通过通信总线2103连接。
存储器2102被配置为存储由处理器2101计算机程序和应用,还可以缓存待处理器2101以及电子设备中各部分待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。
处理器2101执行程序时实现上述任一项语义分割方法的步骤。处理器2101通常控制电子设备21的总体操作。
上述处理器可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(Programmable Logic Device,PLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。可以理解地,实现上述处理器功能的电子器件还可以为其它,本公开实施例不作限制。
本公开实施例提供一种计算机可读存储介质,存储有计算机程序,被配置为被处理器执行时,实现上述语义分割方法。
计算机可读取存储介质可以是保持和存储由指令执行设备使用的指令的有形设备,可为易失性存储介质或非易失性存储介质。计算机可读存储介质例如可以是——但不限于——电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:U盘、磁碟、光盘、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦拭可编程只读存储器(EPROM或闪存)、静态随机存储读取器(ROM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、记性编码设备、例如其上存储有指令的打孔卡或凹槽内凹起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电池波、通过波导或其他传媒介质传播的电池波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
上述存储器可以是只读存储器(Read Only Memory,ROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory, EPROM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性随机存取存储器(Ferromagnetic Random Access Memory,FRAM)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(Compact Disc Read-Only Memory,CD-ROM)等存储器;也可以是包括上述存储器之一或任意组合的各种终端,如移动电话、计算机、平板设备、个人数字助理等。
本公开实施例提供一种计算机程序,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在计算机上运行的情况下,所述计算机执行上述语义分割方法。
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开存储介质和设备实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本公开的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本公开的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开实施例的实施过程构成任何限定。上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。
在本公开所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本公开实施例方案的目的。
另外,在本公开各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
或者,本公开上述集成的单元如果以软件功能部分的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得设备自动测试线执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
本公开所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本公开所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本公开的实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。
工业实用性
本公开实施例提供了一种语义分割方法、装置、电子设备和计算机可读存储介质,所述方法包括:获取待处理图像;采用语义分割模型,对待处理图像进行语义分割处理,得到待处理图像的语义分割结果;其中,语义分割模型是以参考语义模型输出的第一中间特征进行轮廓分解或增强处理的第一变换特征为参考,结合待训练的语义分割模型输出的第二中间特征进行轮廓分解或增强处理的第二变换特征训练得到的;第一中间特征和第二中间特征包括以下至少一组:第一纹理特征和第二纹理特征;第一语义特征和第二语义特征;第一变换特征和第二变换特征包括以下至少一组:第一轮廓特征和第二轮廓特征;第一增强特征和第二增强特征。通过本公开实施例,可以使语义分割模型学习到更加丰富的知识,从而提高了在使用语义分割模型对待处理图像进行语义分割时的语义分割精度。

Claims (15)

  1. 一种语义分割方法,包括:
    获取待处理图像;
    采用语义分割模型,对所述待处理图像进行语义分割处理,得到所述待处理图像的语义分割结果;其中,
    所述语义分割模型是以参考语义模型输出的第一中间特征进行轮廓分解或增强处理的第一变换特征为参考,结合待训练的语义分割模型输出的第二中间特征进行轮廓分解或增强处理的第二变换特征训练得到的;
    所述第一中间特征和所述第二中间特征包括以下至少一组:第一纹理特征和第二纹理特征;第一语义特征和第二语义特征;
    所述第一变换特征和所述第二变换特征包括以下至少一组:第一轮廓特征和第二轮廓特征;第一增强特征和第二增强特征。
  2. 根据权利要求1所述的方法,其中,所述参考语义模型为预先训练的语义分割网络;所述待训练的语义分割模型为与所述参考语义模型功能一致的网络;所述方法还包括:
    采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到所述第一中间特征和所述第二中间特征;
    对所述第一中间特征和所述第二中间特征分别进行轮廓分解或增强处理,得到所述第一变换特征和所述第二变换特征;
    至少基于所述第一变换特征和所述第二变换特征,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
  3. 根据权利要求2所述的方法,其中,所述采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到所述第一中间特征和所述第二中间特征,包括:
    采用所述参考语义模型和所述待训练的语义分割模型分别对图像样本进行特征提取,得到第一纹理特征和第二纹理特征;
    对所述第一纹理特征和第二纹理特征进行特征提取,得到第一语义特征和第二语义特征。
  4. 根据权利要求2或3所述的方法,其中,所述对所述第一中间特征和所述第二中间特征分别进行轮廓分解或增强处理,得到所述第一变换特征和所述第二变换特征,包括以下至少一个:
    对所述第一纹理特征和所述第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征;
    对所述第一语义特征和所述第二语义特征进行增强处理,得到第一增强特征和第二增强特征。
  5. 根据权利要求2至4任一项所述的方法,其中,所述至少基于所述第一变换特征和所述第二变换特征,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型,包括:
    基于预设第一损失函数、所述第一轮廓特征和所述第二轮廓特征进行损失计算,确定第一损失;
    基于预设第二损失函数、所述第一增强特征和所述第二增强特征,确定第二损失;
    基于所述第一损失和所述第二损失中的至少一个,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
  6. 根据权利要求2至5任一项所述的方法,其中,所述至少基于所述第一变换特征和所述第二变换特征,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型,包括:
    对所述第一增强特征和所述第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征;
    基于预设第三损失函数、所述第一语义分割特征和所述第二语义分割特征进行损失计算,确定第三损失;
    基于所述第三损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型;或者,基于第一损失、第二损失中的至少一个,以及所述第三损失,对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
  7. 根据权利要求6所述的方法,其中,所述第一纹理特征包括:至少一个第一子纹理特征;所述第二纹理特征包括:至少一个第二子纹理特征;
    所述对所述第一增强特征和所述第二增强特征分别进行语义分割预测,得到第一语义分割特征和第二语义分割特征之后,所述方法还包括:
    基于所述第一语义分割特征、第一语义特征、所述至少一个第一子纹理特征,确定第一图推理关系;
    基于所述第二语义分割特征、第二语义特征、所述至少一个第二子纹理特征,确定第二图推理关系;
    基于预设第四损失函数、所述第一图推理关系和所述第二图推理关系进行损失计算,确定第四损失;
    基于所述第四损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型;或者,基于第一损失、第二损失、第三损失中的至少一个,以及所述第四损失对所述待训练的语义分割模型进行训练,确定出所述语义分割模型。
  8. 根据权利要求7所述的方法,其中,所述基于所述第一语义分割特征、第一语义特征、所述至少一个第一子纹理特征,确定第一图推理关系,包括:
    基于输出顺序,确定所述第一语义分割特征、所述第一语义特征、所述至少一个第一子纹理特征之间的至少两个差异特征;
    对所述至少两个差异特征,进行相关处理,得到差异特征之间的相关度;
    基于所述至少两个差异特征和所述差异特征之间的相关度,构成所述第一图推理关系。
  9. 根据权利要求8所述的方法,其中,所述基于所述至少两个差异特征和所述差异特征之间的相关度,构成所述第一图推理关系,包括:
    在所述差异特征之间的相关度中存在小于等于预设关联阈值的目标差异之间的相关关联特征情况下,基于所述目标差异之间的关联特征和所述至少两个差异特征,构成所述第一图推理关系。
  10. 根据权利要求4所述的方法,其中,所述对所述第一纹理特征和所述第二纹理特征分别进行轮廓分解处理,得到第一轮廓特征和第二轮廓特征,包括:
    基于隔行扫描因子,对所述第一纹理特征和所述第二纹理特征均进行滤波处理后,得到各自的高通子带和低通子带;
    对所述高通子带进行方向滤波,得到方向子带;
    分别对所述第一纹理特征和所述第二纹理特征对应的所述低通子带和所述方向子带进行融合,得到所述第一轮廓特征和所述第二轮廓特征,完成轮廓分解处理。
  11. 根据权利要求4所述的方法,其中,所述对所述第一语义特征和所述第二语义特征进行增强处理,得到第一增强特征和第二增强特征,包括:
    对所述第一语义特征和所述第二语义特征均进行至少两种转换,得到各自对应的至少两种语义变换特征;
    对所述至少两种语义变换特征中不同的语义变换特征进行自增强处理,得到相关矩阵;
    将所述相关矩阵与所述至少两种语义变换特征中的一个语义变换特征进行增强处理,得到自增强特征;
    基于所述第一语义特征和所述第二语义特征各自对应的自增强矩阵,将各自的自增强特征确定为所述第一增强特征和所述第二增强特征;或者,将所述第一语义特征和所述第二语义特征,分别与各自的自增强特征进行融合,得到所述第一增强特征和所述第二增强特征。
  12. 一种语义分割装置,包括:
    特征获取部分,被被配置为获取待处理图像;
    语义分割部分,被配置为采用所述语义分割模型,对所述待处理图像进行语义分割处理,得到所述待处理图像的语义分割结果;所述语义分割模型是以参考语义模型输出的第一中间特征进行轮廓分解或增强处理的第一变换特征为参考,结合待训练的语义分割模型输出的第二中间特征进行轮廓分解或增强处理的第二变换特征训练得到的;
    其中,所述第一中间特征和所述第二中间特征包括以下至少一组:第一纹理特征和第二纹理特征;第一语义特征和第二语义特征;所述第一变换特征和所述第二变换特征包括以下至少一组:第一轮廓特征和第二轮廓特征;第一增强特征和第二增强特征。
  13. 一种电子设备,包括:
    存储器,被配置为存储计算机程序;
    处理器,被配置为执行所述存储器中存储的计算机程序时,实现权利要求1至11任一项所述的方法。
  14. 一种计算机可读存储介质,存储有计算机程序,被配置为被处理器执行时,实现权利要求1至11任一项所述的方法。
  15. 一种计算机程序,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在计算机上运行的情况下,所述计算机执行权利要求1至11任一项所述的方法。
PCT/CN2021/125073 2021-06-29 2021-10-20 语义分割方法、装置、电子设备和计算机可读存储介质 WO2023273026A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110725811.8A CN113470057B (zh) 2021-06-29 2021-06-29 语义分割方法、装置、电子设备和计算机可读存储介质
CN202110725811.8 2021-06-29

Publications (1)

Publication Number Publication Date
WO2023273026A1 true WO2023273026A1 (zh) 2023-01-05

Family

ID=77873679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/125073 WO2023273026A1 (zh) 2021-06-29 2021-10-20 语义分割方法、装置、电子设备和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN113470057B (zh)
WO (1) WO2023273026A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721420A (zh) * 2023-08-10 2023-09-08 南昌工程学院 一种电气设备紫外图像的语义分割模型构建方法及系统
CN116863279A (zh) * 2023-09-01 2023-10-10 南京理工大学 用于移动端模型轻量化的基于可解释指导的模型蒸馏方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470057B (zh) * 2021-06-29 2024-04-16 上海商汤智能科技有限公司 语义分割方法、装置、电子设备和计算机可读存储介质
CN113888567B (zh) * 2021-10-21 2024-05-14 中国科学院上海微系统与信息技术研究所 一种图像分割模型的训练方法、图像分割方法及装置
CN113744164B (zh) * 2021-11-05 2022-03-15 深圳市安软慧视科技有限公司 一种快速夜间低照度图像增强方法、系统及相关设备
CN116342884B (zh) * 2023-03-28 2024-02-06 阿里云计算有限公司 图像分割及模型训练的方法、服务器

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280451A (zh) * 2018-01-19 2018-07-13 北京市商汤科技开发有限公司 语义分割及网络训练方法和装置、设备、介质、程序
CN111062951A (zh) * 2019-12-11 2020-04-24 华中科技大学 一种基于语义分割类内特征差异性的知识蒸馏方法
US20200167546A1 (en) * 2018-11-28 2020-05-28 Toyota Research Institute, Inc. Systems and methods for predicting semantics of a particle using semantic segmentation
CN112308862A (zh) * 2020-06-04 2021-02-02 北京京东尚科信息技术有限公司 图像语义分割模型训练、分割方法、装置以及存储介质
CN113011425A (zh) * 2021-03-05 2021-06-22 上海商汤智能科技有限公司 图像分割方法、装置、电子设备及计算机可读存储介质
CN113470057A (zh) * 2021-06-29 2021-10-01 上海商汤智能科技有限公司 语义分割方法、装置、电子设备和计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280451A (zh) * 2018-01-19 2018-07-13 北京市商汤科技开发有限公司 语义分割及网络训练方法和装置、设备、介质、程序
US20200167546A1 (en) * 2018-11-28 2020-05-28 Toyota Research Institute, Inc. Systems and methods for predicting semantics of a particle using semantic segmentation
CN111062951A (zh) * 2019-12-11 2020-04-24 华中科技大学 一种基于语义分割类内特征差异性的知识蒸馏方法
CN112308862A (zh) * 2020-06-04 2021-02-02 北京京东尚科信息技术有限公司 图像语义分割模型训练、分割方法、装置以及存储介质
CN113011425A (zh) * 2021-03-05 2021-06-22 上海商汤智能科技有限公司 图像分割方法、装置、电子设备及计算机可读存储介质
CN113470057A (zh) * 2021-06-29 2021-10-01 上海商汤智能科技有限公司 语义分割方法、装置、电子设备和计算机可读存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721420A (zh) * 2023-08-10 2023-09-08 南昌工程学院 一种电气设备紫外图像的语义分割模型构建方法及系统
CN116721420B (zh) * 2023-08-10 2023-10-20 南昌工程学院 一种电气设备紫外图像的语义分割模型构建方法及系统
CN116863279A (zh) * 2023-09-01 2023-10-10 南京理工大学 用于移动端模型轻量化的基于可解释指导的模型蒸馏方法
CN116863279B (zh) * 2023-09-01 2023-11-21 南京理工大学 用于移动端模型轻量化的基于可解释指导的模型蒸馏方法

Also Published As

Publication number Publication date
CN113470057A (zh) 2021-10-01
CN113470057B (zh) 2024-04-16

Similar Documents

Publication Publication Date Title
WO2023273026A1 (zh) 语义分割方法、装置、电子设备和计算机可读存储介质
CN109087258B (zh) 一种基于深度学习的图像去雨方法及装置
CN109086753B (zh) 基于双通道卷积神经网络的交通标志识别方法、装置
CN113870422B (zh) 一种点云重建方法、装置、设备及介质
CN115131638B (zh) 视觉文本预训练模型的训练方法、装置、介质和设备
CN114049332A (zh) 异常检测方法及装置、电子设备和存储介质
CN112085120B (zh) 多媒体数据的处理方法、装置、电子设备及存储介质
CN116563751B (zh) 一种基于注意力机制的多模态情感分析方法及系统
CN115641533A (zh) 目标对象情绪识别方法、装置和计算机设备
CN114419351A (zh) 图文预训练模型训练、图文预测模型训练方法和装置
US20240013564A1 (en) System, devices and/or processes for training encoder and/or decoder parameters for object detection and/or classification
CN116129141A (zh) 医学数据处理方法、装置、设备、介质和计算机程序产品
CN116958323A (zh) 图像生成方法、装置、电子设备、存储介质及程序产品
CN114842316A (zh) 一种结合卷积神经网络及Transformer网络的实时目标检测方法
CN117377952A (zh) 一种物品推荐方法、物品知识图谱、模型训练方法及装置
CN110580294B (zh) 实体融合方法、装置、设备及存储介质
CN116975347A (zh) 图像生成模型训练方法及相关装置
CN116958852A (zh) 视频与文本的匹配方法、装置、电子设备和存储介质
Su et al. Chinese microblog sentiment analysis by adding emoticons to attention-based CNN
CN113609355B (zh) 一种基于动态注意力与图网络推理的视频问答系统、方法、计算机及存储介质
CN115984934A (zh) 人脸位姿估计模型的训练方法、人脸位姿估计方法及装置
CN113886615A (zh) 一种基于多粒度联想学习的手绘图像实时检索方法
CN114663974A (zh) 一种融入位置感知注意力的行人重识别方法
CN114281933A (zh) 文本处理方法、装置、计算机设备及存储介质
CN114329064A (zh) 视频处理方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21947953

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE