CN114972749B - Method, apparatus, medium and device for processing semantic segmentation model - Google Patents

Method, apparatus, medium and device for processing semantic segmentation model Download PDF

Info

Publication number
CN114972749B
CN114972749B CN202210461761.1A CN202210461761A CN114972749B CN 114972749 B CN114972749 B CN 114972749B CN 202210461761 A CN202210461761 A CN 202210461761A CN 114972749 B CN114972749 B CN 114972749B
Authority
CN
China
Prior art keywords
image
content
similarity
style
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210461761.1A
Other languages
Chinese (zh)
Other versions
CN114972749A (en
Inventor
高欢
王国利
张骞
黄畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Information Technology Co Ltd
Original Assignee
Beijing Horizon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Information Technology Co Ltd filed Critical Beijing Horizon Information Technology Co Ltd
Priority to CN202210461761.1A priority Critical patent/CN114972749B/en
Publication of CN114972749A publication Critical patent/CN114972749A/en
Application granted granted Critical
Publication of CN114972749B publication Critical patent/CN114972749B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are a method, an apparatus, a storage medium, and an electronic device for processing a semantic segmentation model, wherein the method comprises: acquiring respective intermediate layer characteristics of two bright images output by an intermediate layer of a first semantic segmentation model and respective intermediate layer characteristics of two dark images output by an intermediate layer of a second semantic segmentation model; determining the style similarity and the content similarity between the bright image and the dark image corresponding to each scene in the two scenes to obtain a first style similarity and a second style similarity, and the first content similarity and the second content similarity; determining a style loss function based on the first style similarity and the second style similarity; determining a content loss function based on the first content similarity and the second content similarity; determining a distillation loss function based on the style loss function and the content loss function; and taking the distillation loss function as supervision, and updating parameters of the second semantic segmentation model through distillation training to obtain a processed second semantic segmentation model.

Description

Method, apparatus, medium and device for processing semantic segmentation model
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to a method, apparatus, storage medium, and electronic device for processing a semantic segmentation model.
Background
Semantic segmentation has been widely used in many fields, such as medical image analysis, image recognition, autopilot, etc., as an important branch in the field of computer vision. With the proposal of convolutional neural networks, convolutional neural networks have been widely used for semantic segmentation tasks.
Convolution-based semantic segmentation models tend to rely on a large number of pixel-level labels when trained. In general, images taken in scenes with better illumination conditions may have higher definition and sufficient pixel-level labels, compared to lower definition of images taken in scenes with worse illumination conditions, and the number of pixel-level labels is relatively smaller, so that in sample data for training a semantic segmentation model, the number of images with worse illumination conditions is smaller, resulting in poorer segmentation performance of the semantic segmentation model when processing such images.
In the related art, in order to improve the segmentation performance of the semantic segmentation model for the image with poor illumination condition under the condition of limited samples, a domain adaptation method is generally adopted to enable the semantic segmentation model trained based on the image with good illumination condition to adapt to the image with poor illumination condition, so that the segmentation performance of the semantic segmentation model for the image with poor illumination condition is improved on the premise of not using a pixel-level label of the image with poor illumination condition.
Disclosure of Invention
The embodiment of the disclosure provides a method, a device, a storage medium and electronic equipment for processing a semantic segmentation model, which can better improve the semantic segmentation performance of the semantic segmentation model for dark images on the premise of not introducing additional data and operand.
According to one aspect of an embodiment of the present disclosure, there is provided a method for improving a semantic segmentation model, comprising: processing the two bright images by using a first semantic segmentation model, and acquiring intermediate layer characteristics of each of the two bright images output by the intermediate layer of the first semantic segmentation model, wherein the two bright images are acquired in two scenes; processing the two dark images by using a second semantic segmentation model, and acquiring respective middle layer characteristics of the two dark images output by the middle layer of the second semantic segmentation model, wherein the second semantic segmentation model and the first semantic segmentation model have the same structure, and the two dark images are acquired in two scenes; based on the respective intermediate layer characteristics of the two bright images and the respective intermediate layer characteristics of the two dark images, respectively determining the content similarity between the bright image and the dark image corresponding to each scene in the two scenes to obtain a first content similarity and a second content similarity, and respectively determining the style similarity between the bright image and the dark image corresponding to each scene in the two scenes to obtain a first style similarity and a second style similarity; determining a style loss function based on the first style similarity and the second style similarity; determining a content loss function based on the first content similarity and the second content similarity; determining a distillation loss function based on the style loss function and the content loss function; and taking the distillation loss function as supervision, and updating parameters of the second semantic segmentation model through distillation training to obtain a processed second semantic segmentation model.
According to yet another aspect of an embodiment of the present disclosure, there is provided an apparatus for processing a semantic segmentation model, including: the first processing unit is configured to process two kinds of bright images by using a first semantic segmentation model, and acquire respective middle layer characteristics of the two kinds of bright images output by a middle layer of the first semantic segmentation model, wherein the two kinds of bright images are images acquired in two scenes; the second processing unit is configured to process two kinds of dark images by using a second semantic segmentation model, and acquire respective middle layer characteristics of the two kinds of dark images output by a middle layer of the second semantic segmentation model, wherein the second semantic segmentation model and the first semantic segmentation model have the same structure, and the two kinds of dark images are acquired in two scenes; the third processing unit is configured to determine content similarity between the bright image and the dark image corresponding to each scene in the two scenes respectively based on the respective intermediate layer characteristics of the two bright images and the respective intermediate layer characteristics of the two dark images to obtain first content similarity and second content similarity, and determine style similarity between the bright image and the dark image corresponding to each scene in the two scenes respectively to obtain first style similarity and second style similarity; a style loss unit configured to determine a style loss function based on the first style similarity and the second style similarity; a content loss unit configured to determine a content loss function based on the first content similarity and the second content similarity; a distillation loss unit configured to determine a distillation loss function based on the style loss function and the content loss function; and the model processing unit is configured to update parameters of the second semantic segmentation model through distillation training by taking the distillation loss function as supervision to obtain a processed second semantic segmentation model.
According to yet another aspect of the disclosed embodiments, there is provided a computer readable storage medium storing a computer program for performing the method of any one of the embodiments described above.
According to still another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; a processor for reading the executable instructions from the memory and executing the instructions to implement the method of any of the embodiments described above.
The method for processing the semantic segmentation model comprises the steps of firstly, obtaining respective middle layer characteristics of two bright images output by a middle layer of a first semantic segmentation model and respective middle layer characteristics of two dark images output by a middle layer of a second semantic segmentation model; then, determining the style similarity and the content similarity between the bright image and the dark image corresponding to each scene in the two scenes respectively to obtain a first style similarity and a second style similarity, and a first content similarity and a second content similarity; then determining a style loss function based on the first style similarity and the second style similarity, and determining a content loss function based on the first content similarity and the second content similarity; and then determining a distillation loss function based on the style loss function and the content loss function, taking the distillation loss function as supervision, and updating parameters of the second semantic segmentation model through distillation training. Through consistency of style characterization among different bright images and between different dark images and consistency of content characterization of bright images and dark images of the same scene, the distillation process is supervised, semantic level related knowledge of a first semantic segmentation model can be migrated to a second semantic segmentation model, so that middle layer features extracted by the second semantic segmentation model when the dark images are processed and middle layer features extracted by the first semantic segmentation model when the bright images are processed have the same or similar feature distribution, and on the premise that extra data and operand are not introduced, the segmentation performance of the second semantic segmentation model for the dark images is improved.
The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following more particular description of embodiments of the disclosure, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. In the drawings, like reference numerals generally refer to like parts or steps.
FIG. 1 is a schematic illustration of one scenario of the method of the present disclosure for processing a semantic segmentation model;
FIG. 2 is a flow diagram of one embodiment of a method of the present disclosure for processing a semantic segmentation model;
FIG. 3 is a flow diagram of acquiring an image in one embodiment of a method of the present disclosure for processing a semantic segmentation model;
FIG. 4 is a flow diagram of determining a first style similarity and a second style similarity in one embodiment of a method of the present disclosure for processing a semantic segmentation model;
FIG. 5 is a flow diagram of determining a first content similarity and a second content similarity in one embodiment of a method of the present disclosure for processing a semantic segmentation model;
FIG. 6 is a flow diagram of yet another embodiment of a method of the present disclosure for processing a semantic segmentation model;
FIG. 7 is a schematic diagram of determining style loss functions and content loss functions in one embodiment of a method of the present disclosure for processing semantic segmentation models;
FIG. 8 is a flow diagram of generating a second image in one embodiment of a method of the present disclosure for processing a semantic segmentation model;
FIG. 9 is a flow diagram of modifying a distillation loss function in one embodiment of a method of processing a semantic segmentation model of the present disclosure;
FIG. 10 is a structural schematic diagram of one embodiment of an apparatus for processing a semantic segmentation model of the present disclosure;
fig. 11 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.
It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.
It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.
In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship.
It should also be understood that the description of the various embodiments in this disclosure emphasizes the differences between the various embodiments, and their same or similar parts may be referred to each other, and for brevity, they are not repeated.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Embodiments of the present disclosure may be applicable to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.
Summary of the application
The semantic segmentation model obtained based on the image training with better illumination conditions is adapted to the image with worse illumination conditions by using a domain adaptation method, which generally comprises the following two modes: one is to perform style conversion on images with different illumination conditions (such as daytime scene images with better illumination conditions and nighttime scene images with worse illumination conditions) through a pre-training image style conversion network to generate a synthetic data set; one is to use an image centered by the lighting conditions (which may be, for example, an image taken at dusk) as an intermediate domain, gradually achieving a domain adaptation of the lighting conditions from good to bad.
In the process of realizing the method, the inventor finds that the semantic features in the semantic segmentation task are not fully utilized, so that the converted image style cannot be completely aligned with the real image, the segmentation performance improving effect of the semantic segmentation model is affected, extra operand is introduced, and the complexity of the training process is increased; the latter does not take into account the inherent differences between the different data sets, affects the boosting effect of the segmentation performance of the semantic segmentation model, and introduces additional data.
From this, it can be seen that the method for improving the segmentation performance of the semantic segmentation model by using domain adaptation has at least the following drawbacks: additional data or operand is introduced and the lifting effect is poor.
Exemplary System
Knowledge distillation refers to the training of a student network to achieve knowledge migration by introducing teacher network-related soft targets as part of the distillation loss function.
The method for processing a semantic segmentation model of the present disclosure is exemplarily described below with reference to fig. 1, and fig. 1 shows a schematic view of one scenario of the method for processing a semantic segmentation model of the present disclosure. As shown in fig. 1, the bright image 110 and the bright image 130 may be processed by a first semantic segmentation model 150, respectively, to obtain corresponding first intermediate layer features 111 and third intermediate layer features 131, respectively; meanwhile, the dark image 120 and the dark image 140 are respectively processed by using the second semantic segmentation model 160, so as to respectively obtain a corresponding second intermediate layer feature 121 and a fourth intermediate layer feature 141. Thereafter, a first style similarity between the first interlayer feature 111 and the second interlayer feature 121, and a second style similarity between the third interlayer feature 131 and the fourth interlayer feature 141 may be determined, thereby obtaining a style loss function. Meanwhile, a first content similarity between the first interlayer feature 111 and the second interlayer feature 121, and a second content similarity between the third interlayer feature 131 and the fourth interlayer feature 141 may be determined, thereby obtaining a content loss function. Then, a distillation loss function is determined according to the style loss function and the content loss function, and the distillation training process of the second semantic segmentation model 160 is supervised by using the distillation loss function, so as to improve the semantic segmentation performance of the second semantic segmentation model for dark images.
Exemplary method
The method of the present disclosure for processing semantic models is illustrated below in conjunction with FIG. 2. FIG. 2 illustrates a flow chart of one embodiment of the present disclosure for processing a semantic model, as shown in FIG. 2, the flow comprising the steps of:
step 210, processing the two bright images by using the first semantic segmentation model, and obtaining respective middle layer characteristics of the two bright images output by the middle layer of the first semantic segmentation model.
The two bright images are images acquired in two scenes.
In this embodiment, the scene may characterize the shooting range of the camera in the real world. The two scenes correspond to two different shot ranges, which may be, for example, two cities, two areas in the same city, different blocks, and so forth. As an example, an executing body (which may be a terminal device or a server, for example) may acquire daytime street view of two cities from a public dataset through a network as two kinds of bright images, and accordingly, nighttime street view of the two cities may be as two kinds of dark images.
In this embodiment, the bright image characterizes an image with higher sharpness with exposure greater than a preset threshold. For example, the method may include an image obtained by a photographing method using normal exposure in a scene with good illumination conditions, and an image obtained by a photographing method using long exposure in a scene with poor illumination conditions, and specifically, may include an outdoor image photographed in daytime, an indoor image with good illumination conditions, an image photographed by a photographing method using long exposure at evening, and the like. The two bright images refer to images photographed in two scenes, respectively.
As an example, the preset threshold value may be determined by performing statistical analysis on the exposure degrees of a plurality of images having higher definition, and then according to the result of the statistical analysis. Alternatively, the preset threshold may also be empirically determined. After the execution subject acquires the image set, the exposure degree of the image in the image set may be calculated by a tool (for example, openCV), and then an image having an exposure degree higher than a preset threshold value is determined as a bright image, and an image having an exposure degree equal to or lower than the preset threshold value is determined as a dark image.
The middle layer of the semantic segmentation model refers to the hidden layer of the neural network, for example, when the semantic segmentation model is a full convolution neural network, the middle layer may be the convolution layer between the input layer and the last convolution layer.
The middle layer features are encoded or compressed high-level semantic features of the middle layer output of the semantic segmentation model, and the data form of the middle layer features can be, for example, feature maps. As an example, the intermediate layer feature may be a feature of a certain hidden layer output, or may be a set of features of a plurality of hidden layer outputs.
As an example, when the execution subject (for example, a terminal device or a server) performs semantic segmentation on two kinds of bright images by using the full convolution neural network, the feature output by the penultimate convolution layer in the full convolution neural network may be extracted as the middle layer feature of the two kinds of bright images.
And 220, processing the two dark images by using the second semantic segmentation model, and acquiring respective middle layer characteristics of the two dark images output by the middle layer of the second semantic segmentation model.
The second semantic segmentation model and the first semantic segmentation model have the same structure, and the two dark images are images acquired in two scenes.
In this embodiment, each scene corresponds to one bright image and one dark image, and as an example, images of different periods may be acquired in the same scene to obtain bright images and dark images corresponding to the scene, for example, daytime images and night images may be acquired in the same place, so as to obtain bright images and dark images corresponding to the scene. And (5) replacing the place, and repeating the operation to obtain a bright image and a dark image corresponding to another scene. Thus, two kinds of bright images and two kinds of dark images in the present embodiment are obtained.
Step 230, based on the respective intermediate layer characteristics of the two bright images and the respective intermediate layer characteristics of the two dark images, determining the content similarity between the bright image and the dark image corresponding to each of the two scenes respectively to obtain a first content similarity and a second content similarity, and determining the style similarity between the bright image and the dark image corresponding to each of the two scenes respectively to obtain a first style similarity and a second style similarity.
In the present embodiment, the content similarity characterizes the degree of similarity between the contents contained in the two images, generally in relation to the shooting scene of the images, and not in relation to the style characterization of the images. For example, if two images captured in the same scene contain a higher degree of similarity in content, the content similarity is also higher. The first content similarity and the second content similarity respectively represent the content similarity between the bright image and the dark image corresponding to the two scenes respectively.
As an example, the execution subject may first determine a correspondence between two kinds of bright images and two kinds of dark images from a photographed scene, and determine the same bright image and dark image of the photographed scene as an image pair. And then, carrying out normalization processing on the two middle layer features corresponding to each image pair, for example, mapping the two middle layer features to the same vector space to obtain two normalized feature vectors, wherein the same feature values represent the same semantic meaning in the two feature vectors, so that the difference of the two image contents is abstracted into the difference of the numerical values between the two feature vectors. Then, by calculating the similarity degree (for example, the distance between the feature vectors or cosine similarity) of the two feature vectors, the first content similarity or the second content similarity corresponding to the image pair is obtained. The steps are carried out on the other image pair, so that the second content similarity or the first content similarity corresponding to the other image pair can be obtained.
In this embodiment, the style similarity represents a degree of similarity between style characterizations of a bright image and a dark image captured in the same scene, for example, a first style similarity may represent a degree of style similarity between a bright image and a dark image corresponding to one of the two scenes, and correspondingly, a second style similarity represents a degree of style similarity between a bright image and a dark image corresponding to the other of the two scenes.
Style characterization may be characterized by the illumination of an image, hue bias of a color, and the like.
As an example, the execution subject may extract respective style embeddings from the intermediate layer features corresponding to the bright image and the dark image included in the image pair, and then take the degree of similarity of the two style embeddings as the first style similarity or the second style similarity.
In general, the Gram matrix is used to represent the autocorrelation of the features in the channel dimension, and reflects the corresponding relation of different filters, so that the style features of the features can be reflected.
Step 240, determining a style loss function based on the first style similarity and the second style similarity.
In this embodiment, the style loss function may characterize the difference between the degrees of style similarity corresponding to each of two graphics pairs (bright and dark images in the same scene). Since style characterization is independent of image content, the two images are similar to each other in terms of style similarity. Based on this principle, the style conversion from a bright image to a dark image can be realized on a semantic level by forcing the second style similarity and the second style similarity to be equal using a style loss function.
As an example, the style loss function may be an L2 distance of the first style similarity to the second style similarity. The L2 distance is also called mean square error (Mean Square Error, MSE) and is calculated as shown in equation (1).
In the method, in the process of the invention,representing a first style similarity,/for>Representing a second style similarity, L CDS Representing a style loss function.
Step 250, determining a content loss function based on the first content similarity and the second content similarity.
In this embodiment, the content loss function may characterize the content difference between the images corresponding to the two scenes. Because the content difference of two images in the same scene is irrelevant to the style of the images, the content similarity degree between the bright image and the dark image corresponding to each scene in the two scenes is consistent, based on the principle, the first content similarity is forced to be equal to the second content similarity by using the content loss function, and semantic-level knowledge transfer can be realized in the dimension of content characterization.
As an example, the content loss function may be an L2 distance between the first content similarity and the second content similarity, which is calculated as shown in formula (2).
In the method, in the process of the invention,representing the first content similarity->Representing a second content similarity, L CDC Representing the content loss function.
Step 260, determining a distillation loss function based on the style loss function and the content loss function.
In this embodiment, the distillation loss function may characterize the overall loss in subsequent distillation training. The distillation training is supervised by using the distillation loss function, and parameters of the second semantic segmentation model can be restrained from two dimensions of style and content at the same time.
As an example, a weighted sum of the style loss function and the content loss function may be used as the distillation loss function, which may be calculated as shown in formula (3).
L=x L CDC +y L CDS (3)
Wherein L represents a distillation loss function, x and y represent weight coefficients, L CDC Represents the content loss function, L, in equation (2) CDS The style loss function in equation (1) is expressed.
And 270, taking the distillation loss function as supervision, and updating parameters of the second semantic segmentation model through distillation training to obtain a processed second semantic segmentation model.
In this embodiment, the distillation training represents a process of processing the second semantic segmentation model based on a principle of knowledge distillation, wherein the teacher network is the first semantic segmentation model, the student network is the second semantic segmentation model, and both have the same structure.
Specific procedures for distillation training are exemplified below. The execution main body performs semantic segmentation processing on two kinds of bright images and two kinds of dark images by using a first semantic segmentation model and a second semantic segmentation model respectively, determines first style similarity according to middle layer characteristics output by the first semantic segmentation model, and determines first content similarity and second content similarity by combining middle layer characteristics output by the second semantic segmentation model. And then determining a second style similarity according to the middle layer characteristics output by the second semantic segmentation model, further determining a distillation loss function value of the group of training images, and optimizing parameters of the second semantic segmentation model based on a back propagation principle according to the distillation loss function value so as to migrate the dark knowledge to the student network. And iterating for a plurality of times until the distillation loss function converges, and finishing distillation training of the second semantic segmentation model.
The method for processing the semantic segmentation model provided by the embodiment includes the steps that first, the intermediate layer characteristics of two bright images output by the intermediate layer of a first semantic segmentation model and the intermediate layer characteristics of two dark images output by the intermediate layer of a second semantic segmentation model are obtained; then, determining the style similarity and the content similarity between the bright image and the dark image corresponding to each scene in the two scenes respectively to obtain a first style similarity and a second style similarity, and a first content similarity and a second content similarity; then determining a style loss function based on the first style similarity and the second style similarity, and determining a content loss function based on the first content similarity and the second content similarity; and then determining a distillation loss function based on the style loss function and the content loss function, taking the distillation loss function as supervision, and updating parameters of the second semantic segmentation model through distillation training. Through consistency of style characterization among different bright images and between different dark images and consistency of content characterization of bright images and dark images of the same scene, the distillation process is supervised, semantic level related knowledge of a first semantic segmentation model can be migrated to a second semantic segmentation model, so that middle layer features extracted by the second semantic segmentation model when the dark images are processed and middle layer features extracted by the first semantic segmentation model when the bright images are processed have the same or similar feature distribution, and on the premise that extra data and operand are not introduced, the segmentation performance of the second semantic segmentation model for the dark images is improved.
Referring next to fig. 3, fig. 3 illustrates a flowchart of acquiring an image in one embodiment of a method of the present disclosure for processing a semantic segmentation model, as illustrated in fig. 3, the flowchart comprising the steps of:
step 310, acquiring a first image set, a second image set, a third image set and a fourth image set.
Wherein the first image in the first image set is a marked bright image collected in the first scene, the second image in the second image set is a marked dark image collected in the first scene, the third image in the third image set is an unmarked bright image collected in the second scene, and the fourth image in the fourth image set is an unmarked dark image collected in the second scene.
Step 320, determining the first image in the first image set and the third image in the third image set as two bright images, and determining the second image in the second image set and the fourth image in the fourth image set as two dark images.
In a specific example, the execution subject may acquire the public image data through a network, and construct four image sets therefrom. For example, it is possible to acquire daytime and nighttime images of two cities, respectively, then mark the daytime and nighttime images of the first city as a first image and a second image, respectively, and the daytime and nighttime images of the second city as a third and a fourth image, respectively.
In this embodiment, by acquiring four types of images and combining the four types of images into two kinds of bright images and two kinds of dark images, the two kinds of bright images and the two kinds of dark images are provided for the first semantic segmentation model and the second semantic segmentation model to process, on one hand, consistency in content representation of the bright images and the dark images in the same scene is ensured, on the other hand, consistency in style representation of the same bright images or different dark images is ensured on the premise of keeping content differences between the different bright images or different dark images, related knowledge of semantic levels can be distilled more pertinently, and knowledge distillation effects are further improved, so that the second semantic segmentation model obtains better segmentation performance for the dark images.
Referring to fig. 4 on the basis of the embodiment shown in fig. 3, fig. 4 shows a flowchart for determining a first style similarity and a second style similarity in one embodiment of the method for processing a semantic segmentation model of the present disclosure, as shown in fig. 4, the flowchart comprising the steps of:
step 410, respectively processing the first image in the first image set and the third image in the third image set by using the first semantic segmentation model, and obtaining a first intermediate layer feature of the first image and a third intermediate layer feature of the third image output by the intermediate layer of the first semantic segmentation model.
Step 420, processing the second image in the second image set and the fourth image in the fourth image set by using the second semantic segmentation model, and obtaining a second intermediate layer feature of the second image and a fourth intermediate layer feature of the fourth image output by an intermediate layer of the second semantic segmentation model;
step 430, determining a first style of embedding corresponding to the first middle layer feature, a second style of embedding corresponding to the second middle layer feature, a third style of embedding corresponding to the third middle layer feature, and a fourth style of embedding corresponding to the fourth middle layer feature, respectively.
Step 440, determining the first style similarity based on the similarity of the first style embedding and the second style embedding.
Step 450, determining the second style similarity based on the similarity of the third style embedding and the fourth style embedding.
In a specific example, { S d }、{S n }、{T d }、{T n The first image set, the second image set, the third image set, and the fourth image set are represented, respectively. Step 430 may determine the Gram matrix corresponding to the four middle layer features through formula (4), corresponding to the first style embedding, the second style embedding, the third style embedding, and the fourth style embedding, respectivelyEmbedding. Equation (4) is shown below:
Wherein G is D Representing the style embedding of image D, F D The intermediate layer feature of the image D is represented, p represents the pixel point in the intermediate layer feature, and i and j represent the channel numbers.
Further, steps 440 and 450 may determine the first style similarity and the second style similarity through equation (5).
Depending on the value of k (S or T),representing a first style of embedding or a third style of embedding, < >>Representing a second-style embedding or a fourth-style embedding, respectively,/-style embedding>Representing either the first style similarity or the second style similarity.
In this embodiment, the first style similarity and the second style similarity may be determined based on the four image sets acquired in the embodiment shown in fig. 3, which may inherit pertinence of image data in the four image sets, and help to improve accuracy of the first style similarity and the second style similarity in describing the style characterization difference.
With further reference next to fig. 5 on the basis of the embodiment shown in fig. 4, fig. 5 shows a flowchart of determining a first content similarity and a second content similarity in one embodiment of the method for processing a semantic segmentation model of the present disclosure, as shown in fig. 5, the flowchart comprising the steps of:
step 510, determining a first content embedding corresponding to the first intermediate layer feature, a second content embedding corresponding to the second intermediate layer feature, a third content embedding corresponding to the third intermediate layer feature, and a fourth content embedding corresponding to the fourth intermediate layer feature, respectively.
Step 520, determining the first content similarity based on the similarity between the first content embedding and the second content embedding.
Step 530, determining the second content similarity based on the similarity degree of the third content embedding and the fourth content embedding.
Continuing with the exemplary illustration of the image set in the embodiment shown in fig. 4, steps 520 and 530 may determine the first content similarity and the second content similarity by equation (6), where equation (6) is as follows:
wherein, according to the value of k (S or T),representing a first content insert or a third content insert, < >>Representing a second content embedding or a fourth content embedding, respectively,/->Representing either the first content similarity or the second content similarity.
In the embodiment shown in fig. 5, the first content similarity and the second content similarity may be determined based on the four image sets acquired in the embodiment shown in fig. 3, and pertinence of image data in the four image sets may be inherited, which is helpful for improving the accuracy of characterization of the content characterization differences by the first content similarity and the second content similarity.
In some alternative implementations of the present embodiments, the first content insert, the second content insert, the third content insert, and the first content insert are determined by:
And respectively mapping the first middle layer feature, the second middle layer feature, the third middle layer feature and the fourth middle layer feature to a semantic feature space by using a preset mapping module to obtain a first content embedding, a second content embedding, a third content embedding and a fourth content embedding.
The mapping module constrains the mapping process by adopting a mapping loss function, wherein the mapping loss function represents the difference value between a preset product and third preset divergence and a fourth preset divergence, the preset product is the product of the sum of the first preset divergence and the second preset divergence and a preset weight coefficient, the first preset divergence is the preset divergence between the first content embedding and the second content embedding, the second preset divergence is the preset divergence between the third content embedding and the fourth content embedding, the third preset divergence is the preset divergence between the first content embedding and the third content embedding, and the fourth preset divergence is the preset divergence between the second content embedding and the fourth content embedding.
As an example, the mapping module may be, for example, 2 convolution layers of 1×1, and map each middle layer feature to the same semantic feature space through convolution processing, so as to obtain each content embedding, so that the same number value in each content embedding characterizes the same semantic feature. The first, second, third, and fourth preset divergences may also be calculated based on the JS divergences (Jensen-Shannon divergence).
Continuing with the exemplary illustration of the four image sets in the example shown in FIG. 4, the mapping loss function in this embodiment may be characterized by the following equation (7):
wherein L is JS Represents a mapping loss function, lambda represents a preset weight coefficient,representing a first content insert, a second content insert, a third content insert and a fourth content insert, respectively,/-> The first preset divergence, the second preset divergence, the third preset divergence and the fourth preset divergence are respectively represented.
In this embodiment, the mapping process of the intermediate feature is constrained by the mapping loss function, so that the distances between the content corresponding to the images (for example, the first image and the second image in the same scene) with the adjacent content and embedded in the semantic feature space are closer, and the distances between the content corresponding to the images with different content (for example, the first image and the third image) and embedded in the semantic feature space are further, so that the determined similarity of the content can more accurately describe the difference degree of the image content, and further the knowledge transfer in the content dimension can be more accurately realized.
Referring next to fig. 6, fig. 6 shows a flowchart of yet another embodiment of the method of the present disclosure for processing a semantic segmentation model, as shown in fig. 6, the flowchart comprising the steps of:
Step 610, processing the two bright images by using the first semantic segmentation model, and obtaining respective middle layer characteristics of the two bright images output by the middle layer of the first semantic segmentation model.
And 620, processing the two dark images by using the second semantic segmentation model, and acquiring respective middle layer characteristics of the two dark images output by the middle layer of the second semantic segmentation model.
Step 630, based on the respective intermediate layer characteristics of the two bright images and the respective intermediate layer characteristics of the two dark images, determining the content similarity between the bright image and the dark image corresponding to each of the two scenes respectively to obtain a first content similarity and a second content similarity, and determining the style similarity between the bright image and the dark image corresponding to each of the two scenes respectively to obtain a first style similarity and a second style similarity.
In this embodiment, the two bright images and the two dark images are obtained through the process shown in fig. 3, and the steps 610 and 620 correspond to the steps 210 and 220 described above.
Step 630 may be implemented by the flow shown in fig. 4 and 5, and may obtain a first style embedding, a second style embedding, a third style embedding, a fourth style embedding, a first style similarity, and a second style similarity, and a first content embedding, a second content embedding, a third content embedding, a fourth content embedding, a first content similarity, and a second content similarity. And, the first content embedding, the second content embedding, the third content embedding, and the fourth content embedding are obtained using the mapping module in the above-described embodiment.
Step 640, determining a style loss function based on the first style similarity and the second style similarity.
Step 650, determining a content loss function based on the first content similarity and the second content similarity.
Step 660, determining a distillation loss function based on the style loss function and the content loss function.
Step 670, determining a third content similarity based on the similarity between the first content embedding and the third content embedding.
Step 680, determining the fourth content similarity based on the similarity between the second content embedding and the fourth content embedding.
Step 690, correcting the content loss function based on the L2 distance between the third content similarity and the fourth content similarity and the mapping loss function.
Step 691, taking the distillation loss function as a supervision, and updating parameters of the second semantic segmentation model through distillation training to obtain a processed second semantic segmentation model.
Continuing with the exemplary description of the image set shown in FIG. 4 and the foregoing formulas, steps 670 and 680 may determine the third content similarity and the fourth content similarity by the following formula (8):
wherein, according to the value of r (d or n),can represent a first content insert or a second content insert,/or- >Can represent a third content embedding or a fourth content embedding, correspondingly,/for>The third content similarity or the fourth content similarity may be represented.
Step 690 may use a weighted addition method to modify the content loss function by using the L2 distance between the third content similarity and the fourth content similarity and the mapping loss function, for example, the modified content loss function is shown in formula (9):
wherein L is CDC Representing the content loss function(s),representing the first content similarity->Representing the second content similarity->Representing a third content similarity->Represent fourth content similarity, L JS Representing the mapping loss function.
At this time, the distillation loss function in step 690 may be as shown in formula (10):
further reference may be made to FIG. 7 based on the above example, where D ε { S in FIG. 7 d ,S n ,T d ,T n },G D Representing the style embedding of image D, F D Representing the interlayer characteristics of the image D, e D Representing the content embedding of the image D, proj represents the mapping process and Gram represents the Gram algorithm. Fig. 7 (a) illustrates a calculation flow of the first content similarity, the second content similarity, the third content similarity, the fourth content similarity, and the content loss function in the present embodiment. Fig. 7 (b) illustrates a calculation flow of the first style similarity, the second style similarity, and the style loss function in the present embodiment.
In the embodiment shown in fig. 6, the step of correcting the content loss function based on the L2 distance and the mapping loss function of the third content similarity and the fourth content similarity is reflected, the L2 distance and the mapping loss function of the third content similarity and the fourth content similarity can be introduced into the distillation loss function, the constraint on the content characterization dimension in the distillation training process can be enhanced, the accuracy of knowledge transfer on the content dimension can be improved, and therefore the segmentation performance of the second semantic segmentation model on the dark image can be further improved.
Referring next to fig. 8, fig. 8 illustrates a flow chart of generating a second image in one embodiment of a method of the present disclosure for processing a semantic segmentation model. In some alternative implementations of the embodiment shown in fig. 3-6, the second image may be obtained by a process shown in fig. 8, as shown in fig. 8, comprising the steps of:
step 810, mapping the first image and the fourth image to a first preset color space respectively, so as to obtain a transformed first image and a transformed fourth image.
As an example, the first image, the second image, the third image, and the fourth image are all RGB images. The first preset color space may be a Lab color space, where L represents illuminance (luminance) corresponding to brightness; a represents a range from red to green, and b represents a range from blue to yellow.
Step 820, determining a first mean and a first variance of the transformed first image, and a fourth mean and a fourth variance of the transformed fourth image.
Step 830, the transformed first image is adjusted, so that the first mean and the first variance are aligned with the fourth mean and the fourth variance, respectively, to obtain an adjusted first image.
Step 840, mapping the adjusted first image to a second preset color space to obtain a second image.
As an example, the second preset color space may be an RGB space.
In the embodiment shown in fig. 8, the first image and the fourth image may be mapped to the same color space through color mapping, then the mean value and the variance of the first image are aligned with the mean value and the variance of the fourth image, and the adjusted first image is mapped back to the primary color space, so that the bright image may be converted into the dark image (i.e. the second image) on the premise of keeping the content of the bright image (i.e. the first image) unchanged, so that the semantic level label in the bright image is transferred to the dark image, the consistency of the content of the first image and the second image may be better maintained, and meanwhile, the difference of styles of the first image and the second image is ensured, which is helpful to further improve the pertinence of the first image and the second image, and further improve the segmentation performance of the final second semantic segmentation model on the dark image.
Referring to fig. 9 on the basis of fig. 8, fig. 9 shows a flowchart of correcting a distillation loss function in one embodiment of a method for processing a semantic segmentation model of the present disclosure, as shown in fig. 9, the flowchart comprising the steps of:
step 910, obtaining a first prediction result corresponding to the first image and a third prediction result corresponding to the third image output by the first semantic segmentation model.
Step 920, obtaining a second prediction result corresponding to the second image and a fourth prediction result corresponding to the fourth image, which are output by the second semantic segmentation model.
Step 930, determining a cross entropy of the first prediction result and the second prediction result.
Step 940, using the third prediction result as a label of the fourth image, determining a cross entropy of the fourth prediction result.
In this embodiment, the semantic labels of the third image and the fourth image on the static object are similar, so that the third prediction result corresponding to the third image which can be more easily trained is used as the label of the fourth image, and the determined cross entropy of the fourth prediction result can more accurately represent the segmentation performance of the second semantic segmentation model.
Step 950, determining a cross entropy loss function based on the cross entropy corresponding to each of the first prediction result, the second prediction result and the fourth prediction result.
As an example, a weighted sum of the cross entropy corresponding to each of the first, second and fourth predictors may be determined as a cross entropy loss function.
Step 960, correcting the distillation loss function based on the cross entropy loss function.
Continuing with the exemplary description of equation (10) in the embodiment shown in fig. 6, the distillation loss function obtained by correcting the distillation loss function based on the cross entropy loss function may be as shown in equation (11):
where z represents the weight coefficient and H represents the cross entropy loss function.
In this embodiment, after step 960, the distillation training process for the second semantic segmentation model may be completed through the foregoing step 260 or the foregoing step 690. By introducing the cross entropy of the first semantic segmentation model and the second semantic segmentation model into the distillation loss function, parameters of the second semantic segmentation model can be constrained based on the accuracy of the prediction result, so that the segmentation performance of the second semantic segmentation model is further improved.
Any of the methods provided by the embodiments of the present disclosure for processing semantic segmentation models may be performed by any suitable data processing capable device, including, but not limited to: terminal equipment, servers, etc. Alternatively, any of the methods for processing a semantic segmentation model provided by the embodiments of the present disclosure may be executed by a processor, such as the processor executing any of the methods for processing a semantic segmentation model mentioned by the embodiments of the present disclosure by invoking corresponding instructions stored in a memory. And will not be described in detail below.
Exemplary apparatus
The apparatus for processing a semantic model of the present disclosure is exemplarily described below with reference to fig. 10, fig. 10 shows a schematic structural diagram of an embodiment of the apparatus for processing a semantic segmentation model of the present disclosure, as shown in fig. 10, which includes: the first processing unit 1010 is configured to process two kinds of bright images by using a first semantic segmentation model, and acquire respective middle layer characteristics of the two kinds of bright images output by a middle layer of the first semantic segmentation model, wherein the two kinds of bright images are images acquired in two scenes; the second processing unit 1020 is configured to process the two dark images by using a second semantic segmentation model, and obtain respective middle layer characteristics of the two dark images output by the middle layer of the second semantic segmentation model, wherein the second semantic segmentation model has the same structure as the first semantic segmentation model, and the two dark images are images acquired in two scenes; the third processing unit 1030 is configured to determine, based on the respective intermediate layer features of the two bright images and the respective intermediate layer features of the two dark images, content similarity between the bright image and the dark image corresponding to each of the two scenes, to obtain a first content similarity and a second content similarity, and determine style similarity between the bright image and the dark image corresponding to each of the two scenes, to obtain a first style similarity and a second style similarity; a style loss unit 1040 configured to determine a style loss function based on the first style similarity and the second style similarity; a content loss unit 1050 configured to determine a content loss function based on the first content similarity and the second content similarity; a distillation loss unit 1060 configured to determine a distillation loss function based on the style loss function and the content loss function; the model processing unit 1070 is configured to update the parameters of the second semantic segmentation model by distillation training with the distillation loss function as supervision, resulting in a processed second semantic segmentation model.
In one embodiment, the apparatus further comprises: an image acquisition unit configured to acquire a first image set, a second image set, a third image set, and a fourth image set, wherein the first image in the first image set is a marked bright image acquired in a first scene, the second image in the second image set is a marked dark image acquired in the first scene, the third image in the third image set is an unmarked bright image acquired in a second scene, and the fourth image in the fourth image set is an unmarked dark image acquired in the second scene; an image determining unit configured to determine a first image in the first image set and a third image in the third image set as two bright images and a second image in the second image set and a fourth image in the fourth image set as two dark images.
In one embodiment, the third processing unit 1030 includes: the first middle layer feature extraction module is configured to respectively process the first image in the first image set and the third image in the third image set by using the first semantic segmentation model to obtain first middle layer features of the first image and third middle layer features of the third image output by the middle layer of the first semantic segmentation model; the second middle layer feature extraction module is configured to respectively process a second image in the second image set and a fourth image in the fourth image set by using a second semantic segmentation model, and acquire second middle layer features of the second image and fourth middle layer features of the fourth image output by a middle layer of the second semantic segmentation model; the style embedding module is configured to respectively determine a first style embedding corresponding to the first middle layer feature, a second style embedding corresponding to the second middle layer feature, a third style embedding corresponding to the third middle layer feature and a fourth style embedding; a first style similarity module configured to determine a first style similarity based on the first style embedding and the second style embedding; a second style similarity module configured to determine a second style similarity based on the third style embedding and the fourth style embedding;
In one embodiment, the third processing unit 1030 further includes: the content embedding module is configured to respectively determine a first content embedding corresponding to the first middle layer feature, a second content embedding corresponding to the second middle layer feature, a third content embedding corresponding to the third middle layer feature and a fourth content embedding corresponding to the fourth middle layer feature; a first content similarity module configured to determine a first content similarity based on a degree of similarity of the first content embedding and the second content embedding; and a second content similarity module configured to determine the second content similarity based on a degree of similarity of the third content embedding and the fourth content embedding.
In one embodiment, the apparatus further comprises a mapping unit configured to: respectively mapping the first middle layer feature, the second middle layer feature, the third middle layer feature and the fourth middle layer feature to semantic feature spaces by using a preset mapping module to obtain a first content embedding, a second content embedding, a third content embedding and a fourth content embedding; the mapping module adopts a mapping loss function to restrict a mapping process, the mapping loss function represents the difference value between a preset product and a third preset divergence and a fourth preset divergence, wherein the preset product is the product of the sum of the first preset divergence and the second preset divergence and a preset weight coefficient, the first preset divergence is the preset divergence between the first content embedding and the second content embedding, the second preset divergence is the preset divergence between the third content embedding and the fourth content embedding, the third preset divergence is the preset divergence between the first content embedding and the third content embedding, and the fourth preset divergence is the preset divergence between the second content embedding and the fourth content embedding.
In one embodiment, the apparatus further comprises: a third content similarity unit configured to determine a third content similarity based on a degree of similarity between the first content embedding and the third content embedding; a fourth content similarity unit configured to determine a fourth content similarity based on a degree of similarity between the second content embedding and the fourth content embedding; the first correction unit is configured to correct the content loss function based on the L2 distance between the third content similarity and the fourth content similarity and the mapping loss function.
In one embodiment, the apparatus further comprises an image generation unit configured to: mapping the first image and the fourth image to a first preset color space respectively to obtain a transformed first image and a transformed fourth image; determining a first mean and a first variance of the transformed first image and a fourth mean and a fourth variance of the transformed fourth image; the transformed first image is rectified, so that the first mean value and the first variance are respectively aligned with the fourth mean value and the fourth variance, and an adjusted first image is obtained; and mapping the adjusted first image to a second preset color space to obtain a second image.
In one embodiment, the apparatus further comprises: the first prediction unit is configured to acquire a first prediction result corresponding to the first image and a third prediction result corresponding to the third image output by the first semantic segmentation model; the second prediction unit is configured to acquire a second prediction result corresponding to a second image and a fourth prediction result corresponding to a fourth image output by the second semantic segmentation model; a first cross entropy unit configured to determine respective cross entropies of the first prediction result and the third prediction result, respectively; a second cross entropy unit configured to determine a cross entropy of the fourth prediction result using the third prediction result as a label of the fourth image; a cross entropy loss unit configured to determine a cross entropy loss function based on cross entropy corresponding to each of the first prediction result, the third prediction result, and the fourth prediction result; and a second correction unit configured to correct the distillation loss function based on the cross entropy loss function.
Exemplary electronic device
Next, an electronic device according to an embodiment of the present disclosure is described with reference to fig. 11. Fig. 11 illustrates a block diagram of an electronic device according to an embodiment of the disclosure.
As shown in fig. 11, the electronic device 1100 includes one or more processors 1110 and memory 1120.
The processor 1110 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 1100 to perform desired functions.
Memory 1120 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 1110 to implement the methods for processing semantic models and/or other desired functions of the various embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.
In one example, the electronic device 1100 may further include: an input device 1130 and an output device 1140, interconnected by a bus system and/or other form of connection mechanism (not shown).
In addition, the input device 1130 may include, for example, a keyboard, mouse, and the like.
The output device 1140 may output various information to the outside, including the determined distance information, direction information, and the like. The output devices 1140 may include, for example, displays, speakers, printers, and communication networks and their connected remote output devices, among others.
Of course, for simplicity, only some of the components of the electronic device 1100 that are relevant to the present disclosure are shown in fig. 11, components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 1100 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a method for processing a semantic model according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform the steps in a method for processing a semantic model according to various embodiments of the present disclosure described in the above "exemplary methods" section of the present description.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the disclosure is not intended to be limited to the specific details set forth.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.
The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and are used interchangeably herein. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled anew. Such decomposition and/or recombination should be considered equivalent to the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, changes, additions, and sub-combinations thereof.

Claims (10)

1. A method for processing a semantic segmentation model, comprising:
processing two kinds of bright images by using a first semantic segmentation model, and acquiring respective middle layer characteristics of the two kinds of bright images output by a middle layer of the first semantic segmentation model, wherein the two kinds of bright images are images acquired in two scenes;
processing two kinds of dark images by using a second semantic segmentation model, and acquiring respective middle layer characteristics of the two kinds of dark images output by a middle layer of the second semantic segmentation model, wherein the second semantic segmentation model and the first semantic segmentation model have the same structure, and the two kinds of dark images are images acquired in the two scenes;
based on the respective middle layer characteristics of the two bright images and the respective middle layer characteristics of the two dark images, respectively determining the content similarity between the bright image and the dark image corresponding to each scene in the two scenes to obtain a first content similarity and a second content similarity, and respectively determining the style similarity between the bright image and the dark image corresponding to each scene in the two scenes to obtain a first style similarity and a second style similarity;
Determining a style loss function based on the first style similarity and the second style similarity;
determining a content loss function based on the first content similarity and the second content similarity;
determining a distillation loss function based on the style loss function and the content loss function;
and taking the distillation loss function as supervision, and updating parameters of the second semantic segmentation model through distillation training to obtain a processed second semantic segmentation model.
2. The method of claim 1, wherein the method further comprises:
acquiring a first image set, a second image set, a third image set and a fourth image set, wherein the first image in the first image set is a marked bright image acquired in a first scene, the second image in the second image set is a marked dark image acquired in the first scene, the third image in the third image set is an unmarked bright image acquired in a second scene, and the fourth image in the fourth image set is an unmarked dark image acquired in the second scene;
determining a first image in the first image set and a third image in the third image set as the two bright images, and determining a second image in the second image set and a fourth image in the fourth image set as the two dark images.
3. The method of claim 2, wherein the first style similarity and the second style similarity are determined by:
processing a first image in the first image set and a third image in the third image set by using the first semantic segmentation model respectively, and acquiring a first middle layer characteristic of the first image and a third middle layer characteristic of the third image output by a middle layer of the first semantic segmentation model;
processing a second image in a second image set and a fourth image in a fourth image set by using the second semantic segmentation model respectively, and acquiring a second intermediate layer characteristic of the second image and a fourth intermediate layer characteristic of the fourth image output by an intermediate layer of the second semantic segmentation model;
respectively determining a first style of embedding corresponding to the first middle layer feature, a second style of embedding corresponding to the second middle layer feature, a third style of embedding corresponding to the third middle layer feature and a fourth style of embedding corresponding to the fourth middle layer feature;
determining the first style similarity based on the similarity of the first style embedding and the second style embedding;
And determining the second style similarity based on the similarity degree of the third style embedding and the fourth style embedding.
4. The method of claim 3, wherein the first content similarity and the second content similarity are determined by:
respectively determining a first content embedding corresponding to the first intermediate layer feature, a second content embedding corresponding to the second intermediate layer feature, a third content embedding corresponding to the third intermediate layer feature and a fourth content embedding corresponding to the fourth intermediate layer feature;
determining the first content similarity based on the similarity of the first content embedding and the second content embedding;
and determining the second content similarity based on the similarity degree of the third content embedding and the fourth content embedding.
5. The method of claim 4, wherein the first content insert, the second content insert, the third content insert, and the first content insert are determined by:
respectively mapping the first middle layer feature, the second middle layer feature, the third middle layer feature and the fourth middle layer feature to semantic feature spaces by using a preset mapping module to obtain the first content embedding, the second content embedding, the third content embedding and the fourth content embedding;
The mapping module is used for restraining a mapping process by adopting a mapping loss function, the mapping loss function represents a difference value between a preset product and a third preset divergence and a fourth preset divergence, wherein the preset product is a product of a sum of a first preset divergence and a second preset divergence and a preset weight coefficient, the first preset divergence is a preset divergence between the first content embedding and the second content embedding, the second preset divergence is a preset divergence between the third content embedding and the fourth content embedding, the third preset divergence is a preset divergence between the first content embedding and the third content embedding, and the fourth preset divergence is a preset divergence between the second content embedding and the fourth content embedding.
6. The method according to one of claims 2 to 5, wherein the second image is obtained by:
mapping the first image and the fourth image to a first preset color space respectively to obtain a transformed first image and a transformed fourth image;
determining a first mean and a first variance of the transformed first image, and a fourth mean and a fourth variance of the transformed fourth image;
Adjusting the transformed first image so that the first mean and the first variance are aligned with the fourth mean and the fourth variance, respectively, to obtain an adjusted first image;
and mapping the adjusted first image to a second preset color space to obtain the second image.
7. The method of claim 6, wherein the method further comprises:
acquiring a first prediction result corresponding to the first image and a third prediction result corresponding to the third image, which are output by the first semantic segmentation model;
acquiring a second prediction result corresponding to the second image and a fourth prediction result corresponding to the fourth image, which are output by the second semantic segmentation model;
determining the cross entropy of each of the first prediction result and the second prediction result;
taking the third prediction result as a label of the fourth image, and determining cross entropy of the fourth prediction result;
determining a cross entropy loss function based on the cross entropy corresponding to each of the first prediction result, the second prediction result and the fourth prediction result;
and, before said taking the distillation loss function as a supervision, the method further comprises:
And correcting the distillation loss function based on the cross entropy loss function.
8. An apparatus for processing a semantic segmentation model, comprising:
the first processing unit is configured to process two kinds of bright images by using a first semantic segmentation model, and acquire respective middle layer characteristics of the two kinds of bright images output by a middle layer of the first semantic segmentation model, wherein the two kinds of bright images are images acquired in two scenes;
the second processing unit is configured to process two kinds of dark images by using a second semantic segmentation model, and acquire the intermediate layer characteristics of the two kinds of dark images output by an intermediate layer of the second semantic segmentation model, wherein the second semantic segmentation model and the first semantic segmentation model have the same structure, and the two kinds of dark images are images acquired in the two scenes;
the third processing unit is configured to determine content similarity between the bright image and the dark image corresponding to each of the two scenes based on the respective intermediate layer characteristics of the two bright images and the respective intermediate layer characteristics of the two dark images respectively to obtain first content similarity and second content similarity, and determine style similarity between the bright image and the dark image corresponding to each of the two scenes respectively to obtain first style similarity and second style similarity;
A style loss unit configured to determine a style loss function based on the first style similarity and the second style similarity;
a content loss unit configured to determine a content loss function based on the first content similarity and the second content similarity;
a distillation loss unit configured to determine a distillation loss function based on the style loss function and the content loss function;
and the model processing unit is configured to update parameters of the second semantic segmentation model through distillation training by taking the distillation loss function as supervision to obtain a processed second semantic segmentation model.
9. A computer readable storage medium storing a computer program for performing the method of any one of the preceding claims 1-7.
10. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any of the preceding claims 1-7.
CN202210461761.1A 2022-04-28 2022-04-28 Method, apparatus, medium and device for processing semantic segmentation model Active CN114972749B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210461761.1A CN114972749B (en) 2022-04-28 2022-04-28 Method, apparatus, medium and device for processing semantic segmentation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210461761.1A CN114972749B (en) 2022-04-28 2022-04-28 Method, apparatus, medium and device for processing semantic segmentation model

Publications (2)

Publication Number Publication Date
CN114972749A CN114972749A (en) 2022-08-30
CN114972749B true CN114972749B (en) 2024-03-19

Family

ID=82980184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210461761.1A Active CN114972749B (en) 2022-04-28 2022-04-28 Method, apparatus, medium and device for processing semantic segmentation model

Country Status (1)

Country Link
CN (1) CN114972749B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021072886A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Method and apparatus for image style transfer, device and storage medium
CN112785493A (en) * 2021-01-22 2021-05-11 北京百度网讯科技有限公司 Model training method, style migration method, device, equipment and storage medium
CN114331031A (en) * 2021-12-08 2022-04-12 北京华清安地建筑设计有限公司 Building traditional feature recognition and evaluation method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021072886A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Method and apparatus for image style transfer, device and storage medium
CN112785493A (en) * 2021-01-22 2021-05-11 北京百度网讯科技有限公司 Model training method, style migration method, device, equipment and storage medium
CN114331031A (en) * 2021-12-08 2022-04-12 北京华清安地建筑设计有限公司 Building traditional feature recognition and evaluation method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于加权损失函数的多尺度对抗网络图像语义分割算法;张宏钊;吕启深;党晓婧;李炎裕;代德宇;;计算机应用与软件;20200112(第01期);全文 *

Also Published As

Publication number Publication date
CN114972749A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN111754596B (en) Editing model generation method, device, equipment and medium for editing face image
CN109754015B (en) Neural networks for drawing multi-label recognition and related methods, media and devices
Hwang et al. Context-based automatic local image enhancement
WO2020228525A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
CN110648375B (en) Image colorization based on reference information
US8666148B2 (en) Image adjustment
CN112308862A (en) Image semantic segmentation model training method, image semantic segmentation model training device, image semantic segmentation model segmentation method, image semantic segmentation model segmentation device and storage medium
CN111489401A (en) Image color constancy processing method, system, equipment and storage medium
CN112614070B (en) defogNet-based single image defogging method
CN112836625A (en) Face living body detection method and device and electronic equipment
CN110782448A (en) Rendered image evaluation method and device
CN112581355A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN107729821B (en) Video summarization method based on one-dimensional sequence learning
CN114187515A (en) Image segmentation method and image segmentation device
CN117474817A (en) Method for content unification of composite continuous images
CN114972749B (en) Method, apparatus, medium and device for processing semantic segmentation model
CN117252778A (en) Color constancy method and system based on semantic preservation
CN111738964A (en) Image data enhancement method based on modeling
CN113627342B (en) Method, system, equipment and storage medium for video depth feature extraction optimization
CN112926552B (en) Remote sensing image vehicle target recognition model and method based on deep neural network
CN115205157A (en) Image processing method and system, electronic device, and storage medium
CN113191376A (en) Image processing method, image processing device, electronic equipment and readable storage medium
CN112001301A (en) Building monitoring method and device based on global cross entropy weighting and electronic equipment
CN115082703B (en) Concept-associated color extraction method, device, computer equipment and storage medium
CN117953167B (en) Expressway auxiliary facility modeling method and system based on point cloud data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant