CN116630630A - Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium - Google Patents

Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium Download PDF

Info

Publication number
CN116630630A
CN116630630A CN202310904970.3A CN202310904970A CN116630630A CN 116630630 A CN116630630 A CN 116630630A CN 202310904970 A CN202310904970 A CN 202310904970A CN 116630630 A CN116630630 A CN 116630630A
Authority
CN
China
Prior art keywords
target domain
network
domain
loss
round
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310904970.3A
Other languages
Chinese (zh)
Other versions
CN116630630B (en
Inventor
赖昕
刘枢
吕江波
沈小勇
田倬韬
易振彧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Smartmore Technology Co Ltd
Original Assignee
Shenzhen Smartmore Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Smartmore Technology Co Ltd filed Critical Shenzhen Smartmore Technology Co Ltd
Priority to CN202310904970.3A priority Critical patent/CN116630630B/en
Publication of CN116630630A publication Critical patent/CN116630630A/en
Application granted granted Critical
Publication of CN116630630B publication Critical patent/CN116630630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a semantic segmentation method, a semantic segmentation device, computer equipment and a computer readable storage medium. The method comprises the following steps: extracting source domain image features from the source domain sample map through a source domain shallow layer network; extracting target domain image features from the target domain sample graph through a target domain shallow network; respectively carrying out semantic segmentation processing on the source domain image features and the target domain image features based on the shared deep network to obtain a source domain segmentation result and a target domain segmentation result; determining a first loss according to a distribution difference between the source domain image features and the target domain image features; determining a second loss according to the difference between the source domain segmentation result and the target domain segmentation result; determining a third loss according to the difference between the label carried by the source domain sample graph and the source domain segmentation result; and optimizing the source domain shallow network, the target domain shallow network and the shared deep network according to the first loss, the second loss and the third loss to obtain a semantic segmentation model suitable for the target domain. By adopting the method and the device, the performance of the semantic segmentation model can be improved.

Description

Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium
Technical Field
The present application relates to the field of computer vision, and in particular, to a semantic segmentation method, apparatus, computer device, and computer readable storage medium.
Background
With the development of computer vision technology, a semantic segmentation technology appears, wherein semantic segmentation is a classification of pixel level, and a plurality of semantic areas in an image are partitioned by endowing each pixel with a semantic label, so that a computer is assisted to realize finer and accurate understanding of the image.
In order to alleviate the dependence of the traditional semantic segmentation technology on pixel-level annotation data, a semantic segmentation method based on domain adaptation is provided. The semantic segmentation method based on domain adaptation needs to consider the semantic segmentation task and the characteristic alignment task of the source domain and the target domain. The two tasks have the mutual interference effect, so that the performance loss of the semantic segmentation task can be caused.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a semantic segmentation method, apparatus, computer device, computer readable storage medium, and computer program product that can improve the performance of a semantic segmentation model.
In a first aspect, the present application provides a semantic segmentation method, including:
In each round of training process of the semantic segmentation model applicable to the target domain, extracting a source domain image characteristic aiming at a source domain sample graph through a source domain shallow layer network to be trained in each round;
extracting a target domain sample graph through each round of target domain shallow network to be trained to obtain target domain image characteristics;
semantic segmentation processing is respectively carried out on the source domain image features and the target domain image features based on the shared deep network to be trained in each round, so as to obtain a source domain segmentation result and a target domain segmentation result;
determining a first loss according to the difference between the distribution condition of the source domain image features and the distribution condition of the target domain image features;
determining a second loss according to the difference between the source domain segmentation result and the target domain segmentation result;
determining a third loss according to the difference between the label carried by the source domain sample graph and the source domain segmentation result;
and optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round according to the first loss, the second loss and the third loss so as to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished.
In a second aspect, the present application provides a semantic segmentation apparatus, comprising:
The feature extraction module is used for extracting the source domain image features from the source domain sample graph through the source domain shallow layer network to be trained in each round in the training process of the semantic segmentation model applicable to the target domain; extracting a target domain sample graph through each round of target domain shallow network to be trained to obtain target domain image characteristics;
the semantic segmentation module is used for carrying out semantic segmentation processing on the source domain image features and the target domain image features based on the shared deep network to be trained in each round so as to obtain a source domain segmentation result and a target domain segmentation result;
the loss determination module is used for determining a first loss according to the difference between the distribution condition of the source domain image characteristics and the distribution condition of the target domain image characteristics; determining a second loss according to the difference between the source domain segmentation result and the target domain segmentation result; determining a third loss according to the difference between the label carried by the source domain sample graph and the source domain segmentation result;
and the model optimization module is used for optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round according to the first loss, the second loss and the third loss so as to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished.
In a third aspect, the application provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when the processor executes the computer program.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described above.
In a fifth aspect, the application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described above.
In each round of training of the semantic segmentation model applicable to the target domain, the method, the device, the computer equipment, the computer readable storage medium and the computer program product for semantic segmentation are used for extracting the source domain image characteristics for the source domain sample graph through a source domain shallow layer network to be trained in each round, extracting the target domain image characteristics for the target domain sample graph through a target domain shallow layer network to be trained in each round, and determining the first loss according to the difference between the distribution condition of the source domain image characteristics and the distribution condition of the target domain image characteristics. The source domain shallow network is additionally maintained, and the first loss provides constraint to enable the feature alignment between the source domain image feature and the target domain image feature, so that the source domain shallow network can complete the feature alignment task, and the target domain shallow network and the shared deep network are more concentrated on the semantic segmentation task, so that decoupling of the two tasks is realized. Moreover, the source domain and the target domain are mainly distinguished by shallow image features, while the deep network is good at extracting deep semantic features, so that a shared deep network is commonly maintained for the source domain and the target domain. Semantic segmentation processing is respectively carried out on the source domain image features and the target domain image features based on the shared deep network to be trained in each round, so as to obtain a source domain segmentation result and a target domain segmentation result; a second loss determined according to the difference between the source domain segmentation result and the target domain segmentation result, wherein the constraint imposed by the second loss enables the segmentation condition of the target domain and the segmentation condition of the source domain to be distinguished; and determining a third loss according to the difference between the labels carried by the source domain sample graph and the source domain segmentation result, wherein the constraint imposed by the third loss can enable the model to fully learn knowledge related to semantic segmentation in the source domain. Furthermore, the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round are optimized according to the first loss, the second loss and the third loss, so that the semantic segmentation model applicable to the target domain can be learned based on the fully learned source domain knowledge and the difference between the source domain knowledge and the target domain knowledge while the feature alignment task and the semantic segmentation task are decoupled, and the effect of improving the performance of the semantic segmentation model is achieved.
Drawings
FIG. 1 is a schematic flow chart of a semantic segmentation method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a shallow layer identifier according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a deep level discriminator according to the embodiment of the application;
FIG. 4 is a schematic diagram of a semantic segmentation model applicable to a target domain according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a training process of a semantic segmentation model applicable to a target domain according to an embodiment of the present application;
FIG. 6 is a block diagram of a semantic segmentation device according to an embodiment of the present application;
FIG. 7 is a diagram illustrating an internal architecture of a computer device according to an embodiment of the present application;
FIG. 8 is an internal block diagram of another computer device according to an embodiment of the present application;
fig. 9 is an internal structural diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
As shown in fig. 1, an embodiment of the present application provides a semantic segmentation method, which is described by taking application of the method to a computer device as an example. The method comprises the following steps:
s102, extracting a source domain image characteristic from a source domain sample graph through a source domain shallow layer network to be trained in each round in the training process of the semantic segmentation model applicable to the target domain.
Wherein the source domain sample map refers to a labeled source domain image used as a sample for training a semantic segmentation model. The source domain shallow network is a set of shallow networks that are additionally maintained for the source domain. The purpose of the source domain shallow network is to complete the characteristic alignment task of the source domain and the target domain, so that the target domain shallow network and the shared deep network can concentrate on the semantic segmentation task, thereby achieving the decoupling of the characteristic alignment task and the semantic segmentation task and avoiding the characteristic alignment task from influencing the performance of the semantic segmentation task.
It will be appreciated that the source domain image features are shallow image features in the source domain sample map. The source domain and the target domain differ in the shallow image features. Such as light illumination or texture of the image. The shallow network just excels in extracting the shallow image features, so that the shallow network of the source domain completes the feature alignment task by maintaining an additional set of shallow network of the source domain and applying constraint on the shallow network of the source domain to enable the distribution situation of the features extracted by the shallow network of the source domain to be closer to the distribution situation of the features extracted by the shallow network of the target domain, namely constraint on the feature distribution of the target domain, thereby realizing the alignment of the shallow image features.
Illustratively, the computer device may obtain a semantic segmentation model applicable to the target domain through multiple rounds of training. And acquiring a source domain sample graph carrying the label and a target domain sample graph without the label. In each round of training process of the semantic segmentation model suitable for the target domain, a source domain sample graph is used as input of a source domain shallow network to be trained in each round, and source domain image features are extracted from the source domain sample graph through the source domain shallow network.
In some embodiments, each round of the shallow source domain network to be trained is the last round of the optimized shallow source domain network. It will be appreciated that the source domain shallow network to be trained for the first round is the most initial source domain shallow network.
In some embodiments, the first round of source domain shallow network to be trained is trained based only on labeled source domain sample graphs.
In some embodiments, the network parameters of the source domain shallow network to be trained for the first round may be randomly initialized.
In some embodiments, the computer device may comprise at least one of a terminal or a server. It can be understood that the method provided by the embodiment of the application can also be applied to a server, can also be applied to a system comprising a terminal and the server, and is realized through interaction of the terminal and the server. The terminal can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things equipment and portable wearable equipment, and the internet of things equipment can be smart speakers, smart televisions, smart air conditioners, smart vehicle-mounted equipment and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.
S104, extracting the target domain sample graph through the target domain shallow network to be trained in each round to obtain the target domain image characteristics.
The target domain image features refer to shallow image features in the target domain sample graph. The target domain sample map refers to an unlabeled target domain image that is used as a sample for training the semantic segmentation model.
It can be understood that in the embodiment of the present application, what network structure is specifically adopted for the target domain shallow network and the source domain shallow network is not limited, and only the network structure is required to have the function of extracting the shallow image features.
In some embodiments, the network structure of the target domain shallow network is matched to the network structure of the source domain shallow network. For example, the target domain shallow network and the source domain shallow network may adopt the same network structure.
In some embodiments, each round of the target domain shallow network to be trained is the target domain shallow network after the previous round of optimization. The first round of target domain shallow network to be trained is the most initial target domain shallow network.
In some embodiments, the network parameters of the target domain shallow network to be trained for the first round may be randomly initialized.
S106, semantic segmentation processing is respectively carried out on the source domain image features and the target domain image features based on the shared deep network to be trained in each round, so as to obtain a source domain segmentation result and a target domain segmentation result.
Wherein the shared deep network is used for extracting deeper semantic features from the shallow image features.
For example, the computer device may extract source domain semantic features from the source domain image features as input to the shared deep network to be trained for each round. And taking the target domain image characteristics as the input of the shared deep network to be trained in each round, and extracting the target domain semantic characteristics from the target domain image characteristics. The source domain semantic features are used to characterize semantics in the source domain sample map. The target domain semantic features are used to characterize semantics in the target domain sample map. The semantic segmentation task actually divides different semantic areas from the image, so that semantic segmentation is performed based on the semantic features of the source domain to obtain a source domain segmentation result; and carrying out semantic segmentation based on the semantic features of the target domain to obtain a target domain segmentation result.
In some embodiments, each round of shared deep network to be trained refers to the shared deep network after the last round of optimization. The shared deep network to be trained for the first round is the most initial shared deep network.
In some embodiments, the first round of shared deep network to be trained may be trained using only labeled source domain sample graphs.
In some embodiments, the network parameters of the shared deep network to be trained for the first round may be randomly initialized.
In some embodiments, a computer device may obtain an initial model applicable to a source domain. The initial model is trained using only labeled source domain sample graphs. The initial model comprises a shared deep network to be trained in the first round and a source domain shallow network to be trained in the first round.
S108, determining a first loss according to the difference between the distribution condition of the source domain image characteristics and the distribution condition of the target domain image characteristics.
Wherein the first penalty is used to characterize the degree of feature alignment between the source domain image features and the target domain image features.
For example, the computer device may input the source domain image feature to the shallow layer arbiter, and obtain a discrimination result of the source domain image feature output by the shallow layer arbiter. And determining the first loss according to the discrimination result of the source domain image characteristics. The discrimination result of the source domain image features characterizes that the first loss is smaller as the probability of the source domain image features being true is smaller. The shallow discriminators are trained in a direction that more distinguishes source domain image features from target domain image features. It will be appreciated that the shallow discriminators are trained from the differences in distribution between the source domain image features and the target domain image features. The first loss applied in model training is used for pulling up the feature distribution of the target domain and the source domain, the discrimination result of the shallow layer discriminator on the image feature of the source domain is related to the feature distribution difference between the two domains, and the larger the discrimination result is the true confidence, the closer the feature distribution between the two domains is. The first penalty is used to reflect the distribution difference between the source domain image features and the target domain image features.
In some embodiments, the computer device may calculate the mean square error loss according to the discrimination result of the source domain image feature, to obtain the first loss.
In some embodiments, equation (1) is a calculation equation for the first loss.
(1)
Wherein, the liquid crystal display device comprises a liquid crystal display device,characterization of the first embodimentA loss. D (D) low Characterizing the shallow layer discriminant. Phi (phi) s Characterizing source domain image features. />The number of feature vectors in the source domain image features is characterized. />And characterizing a discrimination result of the shallow layer discriminator on the ith feature vector in the source domain image features.
In some embodiments, training of the semantic segmentation model and the shallow discriminators may be separated by at least one step.
In some embodiments, the training of the individual network and shallow discriminators may be separated by a preset number of steps during each round of training. For example, in the training process of each round, the previous preset number of training steps are to fix parameters of the shallow layer discriminator after the previous round of optimization, preset number of times of optimization is performed on each network to be trained of each round to obtain each network after each round of optimization, the next training step is to fix network parameters of each network after each round of optimization, and the first loss device is trained to obtain the first loss device after each round of optimization. It can be understood that each network and the shallow layer discriminant are alternately trained, and in each training round, each network is optimized for a preset number of times, and then the shallow layer discriminant is optimized.
In some embodiments, the shallow arbiter does not define one training step in each round of training. In each training round, the first number of optimization is performed on each network, and then the second number of optimization is performed on the shallow layer discriminator.
S110, determining a second loss according to the difference between the source domain segmentation result and the target domain segmentation result.
For example, the computer device may take the target domain segmentation result as an input of the deep layer discriminator, and obtain a discrimination result of the target domain segmentation result output by the deep layer discriminator. The second loss is smaller as the probability that the discrimination result of the target domain segmentation result characterizes the source domain image as true is larger. The deep level discriminant is trained in a direction that more discriminates between source domain segmentation results and target domain segmentation results. It will be appreciated that the deep level discriminators are essentially trained from the differences between the source domain segmentation results and the target domain segmentation results.
In some embodiments, the computer device may calculate the mean square error loss according to the discrimination result of the target domain segmentation result, resulting in the second loss.
In some embodiments, the target domain segmentation result may be a target domain probability distribution. The target domain probability distribution is used for representing the first probability of each pixel point in the target domain sample graph under each class.
In some embodiments, equation (2) is a calculation equation for the second loss.
(2)
Wherein L is adv The second loss is characterized. And D represents a deep layer discriminator. P is p t And (5) representing a target domain segmentation result. D (p) t ) i And (3) representing a discrimination result of the deep discriminator on probability distribution of the ith pixel point in the target domain sample graph on each category. N (N) d And the quantity of the feature vectors in the discrimination result output by the deep discriminator is represented. The discrimination result output by the deep discriminator can be regarded as a feature map.
In some embodiments, the training of the shallow and deep discriminators is not spaced by a step. That is, the shallow layer discriminant and the deep layer discriminant are optimized in the same training step.
S112, determining a third loss according to the difference between the labels carried by the source domain sample graph and the source domain segmentation result.
For example, the computer device may calculate a cross entropy loss for the labels carried by the source domain sample map and the source domain segmentation result, resulting in a third loss.
In some embodiments, the source domain segmentation result may be a source domain probability distribution. The source domain probability distribution is used for representing the probability of each pixel point in the source domain sample graph under each class.
In some embodiments, equation (3) is a calculation equation for the third loss.
(3)
Where N characterizes the number of pixels in the source domain sample map. C characterizes the number of categories. P is p s,i,c The probability of the ith pixel in the source domain sample graph under the c-th category is characterized. y is s,i The label of the ith pixel in the source domain sample map is characterized. L (L) ce The third loss is characterized.
And S114, optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round according to the first loss, the second loss and the third loss so as to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished.
Illustratively, the source domain shallow network, the target domain shallow network, and the shared deep network to be trained for each round are optimized in a direction in which the first loss, the second loss, and the third loss decrease. After the multi-training is finished, a semantic segmentation model applicable to the target domain is obtained. The semantic segmentation model suitable for the target domain comprises a target domain shallow layer network and a shared deep layer network.
It can be seen that, in the embodiment of the present application, in each training round of the semantic segmentation model applicable to the target domain, the source domain image feature is extracted for the source domain sample map through the source domain shallow network to be trained in each round, and the target domain image feature is extracted for the target domain sample map through the target domain shallow network to be trained in each round, and the first loss is determined according to the difference between the distribution condition of the source domain image feature and the distribution condition of the target domain image feature. The source domain shallow network is additionally maintained, and the first loss provides constraint to enable the feature alignment between the source domain image feature and the target domain image feature, so that the source domain shallow network can complete the feature alignment task, and the target domain shallow network and the shared deep network are more concentrated on the semantic segmentation task, so that decoupling of the two tasks is realized. Moreover, the source domain and the target domain are mainly distinguished by shallow image features, while the deep network is good at extracting deep semantic features, so that a shared deep network is commonly maintained for the source domain and the target domain. Semantic segmentation processing is respectively carried out on the source domain image features and the target domain image features based on the shared deep network to be trained in each round, so as to obtain a source domain segmentation result and a target domain segmentation result; a second loss determined according to the difference between the source domain segmentation result and the target domain segmentation result, wherein the constraint imposed by the second loss enables the segmentation condition of the target domain and the segmentation condition of the source domain to be distinguished; and determining a third loss according to the difference between the labels carried by the source domain sample graph and the source domain segmentation result, wherein the constraint imposed by the third loss can enable the model to fully learn knowledge related to semantic segmentation in the source domain. Furthermore, the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round are optimized according to the first loss, the second loss and the third loss, so that the semantic segmentation model applicable to the target domain can be learned based on the fully learned source domain knowledge and the difference between the source domain knowledge and the target domain knowledge while the feature alignment task and the semantic segmentation task are decoupled, and the effect of improving the performance of the semantic segmentation model is achieved.
In some embodiments, semantic segmentation processing is performed on the source domain image feature and the target domain image feature based on each round of shared deep network to be trained to obtain a source domain segmentation result and a target domain segmentation result, including:
extracting a source domain semantic feature and a target domain semantic feature respectively aiming at the source domain image feature and the target domain image feature based on the shared deep network to be trained in each round;
carrying out semantic segmentation according to the semantic features of the source domain and the semantic features of the target domain through each round of sharing classifier to be trained to obtain a source domain segmentation result and a target domain segmentation result;
optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round according to the first loss, the second loss and the third loss to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished, wherein the semantic segmentation model comprises the following steps:
and optimizing the source domain shallow network, the target domain shallow network, the shared deep network and the shared classifier to be trained in each round according to the first loss, the second loss and the third loss so as to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished.
For example, the computer device may input source domain image features to each round of shared deep network to be trained, outputting source domain semantic features. The semantic features of the source domain are input into each round of shared classifier to be trained, classification prediction is carried out through each round of shared classifier to be trained, and a source domain segmentation result is output.
And inputting the image features of the target domain into the shared deep network to be trained in each round, and outputting the semantic features of the target domain. And inputting the semantic features of the target domain into each round of shared classifier to be trained, carrying out classification prediction through each round of shared classifier to be trained, and outputting a target domain segmentation result.
In some embodiments, each round of shared classifier to be trained refers to the shared classifier after the last round of optimization. The first round of shared classifier to be trained is the most initial shared classifier.
In some embodiments, the parameters of the shared classifier to be trained for the first round may be randomly initialized.
In some embodiments, the initial model includes a first round of shared classifiers to be trained.
In some embodiments, the semantic segmentation model applicable to the target domain includes a target domain shallow network, a shared deep network, and a shared classifier.
It can be seen that, in this embodiment, a source domain shallow network and a target domain shallow network are maintained separately for a source domain and a target domain, a feature alignment task of the source domain and the target domain is completed through the source domain shallow network, a shared deep network and a shared classifier are maintained together for the source domain and the target domain, an instruction related to semantic segmentation in the source domain is learned and adapted to the target domain, and further the semantic segmentation task concentrated in the target domain through the target domain shallow network, the shared deep network and the shared classifier is decoupled from the feature alignment task, so that the model training effect is ensured, and the performance of the semantic segmentation model applicable to the target domain is improved.
In some embodiments, the method further comprises:
carrying out semantic segmentation according to the semantic features of the target domain through each round of auxiliary classifier to be trained to obtain an auxiliary segmentation result;
determining a fourth loss according to the difference between the auxiliary segmentation result and the target domain segmentation result;
optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round according to the first loss, the second loss and the third loss to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished, wherein the semantic segmentation model comprises the following steps:
and optimizing the source domain shallow network, the target domain shallow network, the shared deep network and the auxiliary classifier to be trained in each round according to the first loss, the second loss, the third loss and the fourth loss so as to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished.
Illustratively, to provide more constraints that enable the model to learn more discriminative features in the target domain, an auxiliary classifier focused on the semantic segmentation of the target domain is introduced during the training of the model. The computer equipment can input the semantic features of the target domain into each round of auxiliary classifier to be trained, and the auxiliary classifier to be trained in each round carries out classification prediction to obtain an auxiliary segmentation result. And calculating cross entropy loss aiming at the auxiliary segmentation result and the target domain segmentation result to obtain a fourth loss.
And carrying out weighted fusion on the first loss, the second loss, the third loss and the fourth loss to obtain target loss. And optimizing the source domain shallow network, the target domain shallow network, the shared deep network and the auxiliary classifier to be trained in each round according to the target loss.
In some embodiments, the computer device may perform one-hot encoding on the target domain segmentation result to obtain a target domain pseudo tag result. The target domain pseudo tag result is used for representing the pseudo tag marked by each pixel point in the target domain sample graph. The computer device may calculate a cross entropy loss for the target domain pseudo tag result and the auxiliary segmentation result, resulting in a fourth loss.
In some embodiments, the auxiliary segmentation result may be an auxiliary probability distribution. The auxiliary probability distribution is used for representing the second probability of each pixel point in the target domain sample graph under each class.
In some embodiments, equation (4) is a calculation equation for the fourth loss.
(4)
Wherein, the liquid crystal display device comprises a liquid crystal display device,the fourth loss is characterized. N (N) t And characterizing the number of pixels participating in fourth loss calculation in the target domain sample graph. />And characterizing the false label of the ith pixel mark participating in fourth loss calculation in the target domain sample graph. />The first probability of the ith pixel participating in the fourth loss calculation in the target domain sample graph under the c-th category is characterized. C characterizes the number of categories.
Therefore, in this embodiment, an additional auxiliary classifier is maintained for the target domain, and according to the fourth loss calculated by the auxiliary classifier on the difference between the auxiliary segmentation result output by the semantic features of the target domain and the segmentation result of the target domain, additional supervision information can be provided for the shallow network of the target domain to learn the features with stronger discriminant, and the fourth loss is introduced into the training of the model, so that the semantic segmentation model applicable to the target domain identifies the features with stronger discriminant in the image of the target domain, and the accuracy of model reasoning is improved.
In some embodiments, the first penalty is determined from a discrimination result of the source domain image features; the distinguishing result of the source domain image features is obtained by inputting the source domain image features into a shallow layer distinguishing device after the previous round of optimization;
the method further comprises the steps of:
after the source domain shallow network and the target domain shallow network to be trained are optimized for each round, network parameters of the source domain shallow network and the target domain shallow network after each round of optimization are fixed, and the shallow discriminator after the previous round of optimization is optimized towards the direction that the confidence degree of discriminating the source domain image characteristic extracted from the source domain shallow network after each round of optimization is lower and the confidence degree of discriminating the target domain image characteristic extracted from the target domain shallow network after each round of optimization is higher is lower.
Illustratively, each network and classifier are co-optimized, and each arbiter is co-optimized. It will be appreciated that each network and classifier corresponds to a generator, with the generator and arbiter being separated by a step. The computer device may fix parameters of the source domain shallow network, the target domain shallow network, the shared deep network, the shared classifier, and the auxiliary classifier after each round of optimization after optimizing the source domain shallow network, the target domain shallow network, the shared deep network, the shared classifier, and the auxiliary classifier to be trained. And acquiring a discrimination result of the shallow layer discriminator after the previous round of optimization on the source domain image characteristics extracted from the source domain shallow layer network after each round of optimization and a discrimination result of the target domain image characteristics extracted from the target domain shallow layer network after each round of optimization. And determining shallow layer discriminant training loss according to the discrimination result of the source domain image features and the discrimination result of the target domain image features. The shallow layer discriminant training loss is positively correlated with the discrimination result of the source domain image feature and negatively correlated with the discrimination result of the target domain image feature. And optimizing the shallow layer discriminant after the previous round of optimization towards the direction of reducing the training loss of the shallow layer discriminant.
In some embodiments, the computer device may calculate a mean square error loss according to the discrimination result of the source domain image feature, to obtain a first sub-training loss. And calculating the mean square error loss according to the discrimination result of the target domain image characteristics to obtain a second sub-training loss. And carrying out weighted fusion on the first sub-training loss and the second sub-training loss to obtain the shallow layer discriminant training loss.
In some embodiments, equation (5) is a calculation of shallow arbiter training loss.
(5)
Wherein, the liquid crystal display device comprises a liquid crystal display device,and (5) representing the shallow layer discriminant training loss. />The first sub-training loss is characterized. />The second sub-training loss is characterized. Phi (phi) t And characterizing the image characteristics of the target domain.And characterizing a discrimination result of the shallow layer discriminator on the ith feature vector in the target domain image features. It will be appreciated that the number of feature vectors in the source domain image feature and the target domain image feature are the same,/->The number of feature vectors in the source domain image features or the target domain image features is characterized. Phi (phi) s Characterizing source domain image features. />And characterizing a discrimination result of the shallow layer discriminator on the ith feature vector in the source domain image features.
In some embodiments, as shown in FIG. 2, a schematic diagram of a shallow arbiter is provided. When the source domain image features are used as input, determining a first sub-training loss based on the judging result of the source domain image features output by the shallow layer judging device. When the target domain image features are used as input, determining a second sub-training loss based on the judging result of the target domain image features output by the shallow layer judging device. The shallow arbiter is optimized based on the first sub-training loss and the second sub-training loss.
Therefore, in this embodiment, after the source domain shallow network and the target domain shallow network to be trained are optimized in each round, network parameters of the source domain shallow network and the target domain shallow network after each round of optimization are fixed, and the direction that the confidence of distinguishing the source domain image features extracted from the source domain shallow network after each round of optimization is lower and the direction that the confidence of distinguishing the target domain image features extracted from the target domain shallow network after each round of optimization is higher is towards the direction that the confidence of distinguishing the target domain image features extracted from the source domain shallow network after each round of optimization is higher is adopted, the shallow arbiter after the previous round of optimization is optimized, so that the shallow arbiter can distinguish the source domain image features and the target domain image features more, and further the constraint that the alignment of the source domain and the target domain features can be provided accurately based on the first loss of the distinguishing result of the source domain image features by the shallow arbiter is provided.
In some embodiments, the second penalty is determined from a discrimination result of the target domain segmentation result; the discrimination result of the target domain segmentation result is obtained by inputting the target domain segmentation result into a deep layer discriminator after the previous round of optimization;
the method further comprises the steps of:
after each round of source domain shallow network, target domain shallow network and shared deep network to be trained are optimized, network parameters of the source domain shallow network, the target domain shallow network and the shared deep network after each round of optimization are fixed, and the deep layer discriminator after the previous round of optimization is optimized towards the direction that the source domain segmentation result obtained based on the shared deep network after each round of optimization is discriminated to be lower in true confidence and the target domain segmentation result obtained based on the shared deep network after each round of optimization is discriminated to be higher in true confidence.
Illustratively, after each round of source domain shallow network, target domain shallow network, shared deep network and shared classifier to be trained is optimized, parameters of each round of optimized source domain shallow network, target domain shallow network, shared deep network and shared classifier are fixed. And optimizing the deep layer discriminant after the previous round of optimization towards the direction that the source domain segmentation result obtained based on the shared deep layer network after each round of optimization is judged to be lower in true confidence and the target domain segmentation result obtained based on the shared deep layer network after each round of optimization is judged to be higher in true confidence.
Illustratively, the deep arbiter is spaced one step apart from the respective classifier and network. The computer device may fix parameters of the source domain shallow network, the target domain shallow network, the shared deep network, the shared classifier, and the auxiliary classifier after each round of optimization after optimizing the source domain shallow network, the target domain shallow network, the shared deep network, the shared classifier, and the auxiliary classifier to be trained. And acquiring a discrimination result of the deep layer discriminator after the previous round of optimization on the source domain segmentation result output by the sharing classifier after each round of optimization and a discrimination result of the target domain segmentation result output by the sharing classifier after each round of optimization. And determining the deep layer discriminant training loss according to the discrimination result of the source domain segmentation result and the discrimination result of the target domain segmentation result. The deep layer discriminant training loss is positively correlated with the discrimination result of the source domain segmentation result and negatively correlated with the discrimination result of the target domain segmentation result. And optimizing the deep layer discriminant after the previous round of optimization towards the direction of reducing the training loss of the deep layer discriminant.
In some embodiments, the computer device may calculate a mean square error loss according to the discrimination result of the source domain segmentation result, resulting in a third sub-training loss. And calculating the mean square error loss according to the discrimination result of the target domain segmentation result to obtain a fourth sub-training loss. And carrying out weighted fusion on the third sub-training loss and the fourth sub-training loss to obtain the deep layer discriminant training loss.
In some embodiments, equation (6) is a calculation equation for deep discriminant training loss.
(6)
Wherein L is d Characterizing deep level discriminant training loss. P is p s And characterizing a source domain segmentation result. D (p) s ) i And (3) representing a discrimination result of the deep discriminator on probability distribution of the ith pixel point in the source domain sample graph on each category. P is p t And (5) representing a target domain segmentation result. D (p) t ) i And (3) representing a discrimination result of the deep discriminator on probability distribution of the ith pixel point in the target domain sample graph on each category. N (N) d And the quantity of the feature vectors in the discrimination result output by the deep discriminator is represented.
In some embodiments, as shown in fig. 3, a schematic diagram of a deep level discriminator is provided. When the source domain segmentation result is taken as input, determining a third sub-training loss based on the discrimination result of the deep discriminator output source domain segmentation result. When the target domain segmentation result is taken as input, determining a fourth sub-training loss based on the discrimination result of the target domain segmentation result output by the deep discriminator. The deep classifier is optimized based on the third sub-training loss and the fourth sub-training loss.
It can be seen that, in this embodiment, after the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round are optimized, network parameters of the source domain shallow network, the target domain shallow network and the shared deep network after each round of optimization are fixed; and optimizing the deep layer discriminant after the previous round of optimization towards the direction that the source domain segmentation result obtained based on the shared deep layer network after each round of optimization is judged to be lower in true confidence and the target domain segmentation result obtained based on the shared deep layer network after each round of optimization is judged to be higher in true confidence, so that the deep layer discriminant can distinguish the source domain segmentation result and the target domain segmentation result more, and further, the constraint that the source domain semantic category and the target domain semantic category are aligned can be accurately provided based on the second loss obtained by the deep layer discriminant on the discrimination result of the target segmentation result.
In some embodiments, optimizing the source domain shallow network, the target domain shallow network, and the shared deep network for each round of training according to the first loss, the second loss, and the third loss to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished comprises:
Performing weighted fusion on the first loss, the second loss and the third loss to obtain target loss;
and optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round in the direction of reducing the target loss so as to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished.
For example, the computer device may weight and sum the weights corresponding to the first loss, the second loss, and the third loss, respectively, to obtain the target loss. And optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round in the direction of reducing the target loss so as to obtain a semantic segmentation model which comprises the target domain shallow network and the shared deep network and is suitable for the target domain.
In some embodiments, the computer device may weight fuse the first loss, the second loss, the third loss, and the fourth loss to obtain the target loss. And optimizing the source domain shallow network, the target domain shallow network, the shared deep network, the shared classifier and the auxiliary classifier to be trained in each round in the direction of reducing the target loss so as to obtain a semantic segmentation model comprising the target domain shallow network, the shared deep network and the shared classifier.
In some embodiments, the computer device may weight and sum the weights corresponding to the first loss, the second loss, the third loss, and the fourth loss, respectively, to obtain the target loss.
In some embodiments, equation (7) is a calculation equation for the target loss.
(7)
Wherein L characterizes the target loss. L (L) ce The third loss is characterized. Lambda (lambda) adv The weight of the second penalty is characterized. L (L) adv The second loss is characterized.The weight of the first penalty is characterized. />The first loss is characterized. />The weight of the fourth loss is characterized. />The fourth loss is characterized.
It can be seen that, in this embodiment, the first loss, the second loss and the third loss are weighted and fused to obtain the target loss, the target loss provides various constraints for adapting the learned source domain knowledge to the target domain, and optimizes the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round in the direction of reducing the target loss, so that the training effect of the semantic segmentation model can be ensured, and the semantic segmentation model with better performance is obtained.
In some embodiments, the semantic segmentation model applicable to the target domain comprises a target domain shallow layer network, a shared deep layer network and a shared classifier; the method further comprises the steps of:
Inputting the target domain image to be processed into a semantic segmentation model suitable for the target domain, and extracting shallow image features of the target domain image to be processed through a target domain shallow network;
extracting deep semantic features corresponding to shallow image features through a shared deep network;
and carrying out semantic segmentation on the deep semantic features through a sharing classifier to obtain an image segmentation result of the target domain image to be processed.
The semantic segmentation model applicable to the target domain is used for carrying out semantic segmentation on the target domain image to be processed. The image segmentation result is used for representing the probability of each pixel point in the target domain image to be processed under each class. It can be appreciated that the source domain shallow network and the auxiliary classifier are discarded after the completion of the multiple rounds of training, and the target domain shallow network, the shared deep network and the shared classifier are reserved to obtain a semantic segmentation model applicable to the target domain.
In some embodiments, a schematic diagram of a semantic segmentation model applicable to a target domain is provided as shown in FIG. 4. The semantic segmentation model comprises a target domain shallow network, a shared deep network and a shared classifier. The input of the semantic segmentation model is a target domain image to be processed, and the output is an image segmentation result.
Therefore, in this embodiment, the feature alignment task and the semantic segmentation task are decoupled and trained to obtain the semantic segmentation model with better performance, so that the target domain image to be processed is input into the semantic segmentation model, and the target domain image is subjected to semantic segmentation through the target domain shallow network, the shared deep network and the shared classifier in the semantic segmentation model, so that an accurate image segmentation result can be obtained.
In some embodiments, as shown in FIG. 5, a training process is provided for a semantic segmentation model applicable to a target domain. And the source domain sample graph is used as input of a source domain shallow network to obtain source domain image characteristics. The first loss is determined based on a discrimination result of the shallow discriminator on the source domain image feature. And taking the source domain image characteristics as the input of the shared deep network to obtain source domain semantic characteristics. And classifying and predicting the semantic features of the source domain through a sharing classifier to obtain a source domain segmentation result. A third penalty is determined based on the difference between the source domain segmentation result and the label carried by the source domain sample image.
And the target domain sample graph is used as input of a target domain shallow network to obtain target domain image characteristics. And the target domain image features are used as the input of the shared deep network to obtain target domain semantic features. And classifying and predicting the semantic features of the target domain through a sharing classifier to obtain a target domain segmentation result. And determining the second loss based on the discrimination result of the deep discriminator on the target domain segmentation result. And carrying out classification prediction on the semantic features of the target domain through an auxiliary classifier to obtain an auxiliary segmentation result. And performing single-hot encoding on the target domain segmentation result to obtain a target domain pseudo tag result. A fourth penalty is determined based on the difference between the target domain pseudo tag and the auxiliary segmentation result.
And optimizing the source domain shallow network, the target domain shallow network, the shared deep network, the shared classifier and the auxiliary classifier based on the first loss, the second loss, the third loss and the fourth loss to obtain a semantic segmentation model.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a semantic segmentation device. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the semantic segmentation device provided below may refer to the limitation of the semantic segmentation method described above, and will not be repeated here.
As shown in fig. 6, an embodiment of the present application provides a semantic segmentation apparatus 600, including:
the feature extraction module 602 is configured to extract, in each training process of the semantic segmentation model applicable to the target domain, a source domain image feature from a source domain sample graph through a source domain shallow network to be trained in each round; extracting a target domain sample graph through each round of target domain shallow network to be trained to obtain target domain image characteristics;
the semantic segmentation module 604 is configured to perform semantic segmentation processing on the source domain image feature and the target domain image feature based on the shared deep network to be trained in each round, so as to obtain a source domain segmentation result and a target domain segmentation result;
a loss determining module 606, configured to determine a first loss according to a difference between a distribution situation of the source domain image feature and a distribution situation of the target domain image feature; determining a second loss according to the difference between the source domain segmentation result and the target domain segmentation result; determining a third loss according to the difference between the label carried by the source domain sample graph and the source domain segmentation result;
the model optimization module 608 is configured to optimize the source domain shallow network, the target domain shallow network, and the shared deep network to be trained according to the first loss, the second loss, and the third loss, so as to obtain a semantic segmentation model applicable to the target domain after the training is completed.
In some embodiments, in terms of performing semantic segmentation processing on the source domain image feature and the target domain image feature based on each round of the shared deep network to be trained, so as to obtain a source domain segmentation result and a target domain segmentation result, the semantic segmentation module 604 is specifically configured to:
extracting a source domain semantic feature and a target domain semantic feature respectively aiming at the source domain image feature and the target domain image feature based on the shared deep network to be trained in each round;
carrying out semantic segmentation according to the semantic features of the source domain and the semantic features of the target domain through each round of sharing classifier to be trained to obtain a source domain segmentation result and a target domain segmentation result;
in optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round according to the first loss, the second loss and the third loss to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished, the model optimization module 608 is specifically configured to:
and optimizing the source domain shallow network, the target domain shallow network, the shared deep network and the shared classifier to be trained in each round according to the first loss, the second loss and the third loss so as to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished.
In some embodiments, the semantic segmentation module 604 is further to:
carrying out semantic segmentation according to the semantic features of the target domain through each round of auxiliary classifier to be trained to obtain an auxiliary segmentation result; determining a fourth loss according to the difference between the auxiliary segmentation result and the target domain segmentation result;
in optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round according to the first loss, the second loss and the third loss to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished, the model optimization module 608 is specifically configured to:
and optimizing the source domain shallow network, the target domain shallow network, the shared deep network and the auxiliary classifier to be trained in each round according to the first loss, the second loss, the third loss and the fourth loss so as to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished.
In some embodiments, the first penalty is determined from a discrimination result of the source domain image features; the distinguishing result of the source domain image features is obtained by inputting the source domain image features into a shallow layer distinguishing device after the previous round of optimization;
model optimization module 608 is also configured to: after the source domain shallow network and the target domain shallow network to be trained are optimized for each round, network parameters of the source domain shallow network and the target domain shallow network after each round of optimization are fixed, and the shallow discriminator after the previous round of optimization is optimized towards the direction that the confidence degree of discriminating the source domain image characteristic extracted from the source domain shallow network after each round of optimization is lower and the confidence degree of discriminating the target domain image characteristic extracted from the target domain shallow network after each round of optimization is higher is lower.
In some embodiments, the second penalty is determined from a discrimination result of the target domain segmentation result; the discrimination result of the target domain segmentation result is obtained by inputting the target domain segmentation result into a deep layer discriminator after the previous round of optimization;
model optimization module 608 is also configured to: after each round of source domain shallow network, target domain shallow network and shared deep network to be trained are optimized, network parameters of the source domain shallow network, the target domain shallow network and the shared deep network after each round of optimization are fixed, and the deep layer discriminator after the previous round of optimization is optimized towards the direction that the source domain segmentation result obtained based on the shared deep network after each round of optimization is discriminated to be lower in true confidence and the target domain segmentation result obtained based on the shared deep network after each round of optimization is discriminated to be higher in true confidence.
In some embodiments, in optimizing the source domain shallow network, the target domain shallow network, and the shared deep network to be trained for each round according to the first loss, the second loss, and the third loss, the model optimization module 608 is specifically configured to:
Performing weighted fusion on the first loss, the second loss and the third loss to obtain target loss;
and optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round in the direction of reducing the target loss so as to obtain a semantic segmentation model applicable to the target domain after the multi-round training is finished.
In some embodiments, the semantic segmentation model applicable to the target domain comprises a target domain shallow layer network, a shared deep layer network and a shared classifier; model optimization module 608 is also configured to:
inputting the target domain image to be processed into a semantic segmentation model suitable for the target domain, and extracting shallow image features of the target domain image to be processed through a target domain shallow network;
extracting deep semantic features corresponding to shallow image features through a shared deep network;
and carrying out semantic segmentation on the deep semantic features through a sharing classifier to obtain an image segmentation result of the target domain image to be processed.
The respective modules in the above-described semantic segmentation apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store a source domain sample map and a target domain sample map. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement the steps in the semantic segmentation method described above.
In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement the steps in the semantic segmentation method described above. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen; the input device of the computer equipment can be a touch layer covered on a display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 7 or 8 are merely block diagrams of portions of structures associated with aspects of the application and are not intended to limit the computer device to which aspects of the application may be applied, and that a particular computer device may include more or less components than those shown, or may be combined with certain components, or may have different arrangements of components.
In some embodiments, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method embodiments described above when the computer program is executed.
In some embodiments, an internal structural diagram of a computer-readable storage medium is provided as shown in fig. 9, the computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the method embodiments described above.
In some embodiments, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as Static Random access memory (Static Random access memory AccessMemory, SRAM) or dynamic Random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A semantic segmentation method, comprising:
in each round of training process of the semantic segmentation model applicable to the target domain, extracting a source domain image characteristic aiming at a source domain sample graph through a source domain shallow layer network to be trained in each round;
extracting a target domain sample graph through each round of target domain shallow network to be trained to obtain target domain image characteristics;
semantic segmentation processing is respectively carried out on the source domain image features and the target domain image features based on the shared deep network to be trained in each round, so as to obtain a source domain segmentation result and a target domain segmentation result;
Determining a first loss according to the difference between the distribution condition of the source domain image features and the distribution condition of the target domain image features;
determining a second loss according to the difference between the source domain segmentation result and the target domain segmentation result;
determining a third loss according to the difference between the label carried by the source domain sample graph and the source domain segmentation result;
and optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round according to the first loss, the second loss and the third loss so as to obtain the semantic segmentation model applicable to the target domain after the multi-round training is finished.
2. The method according to claim 1, wherein the semantic segmentation processing is performed on the source domain image feature and the target domain image feature based on the shared deep network to be trained in each round, so as to obtain a source domain segmentation result and a target domain segmentation result, respectively, including:
extracting a source domain semantic feature and a target domain semantic feature respectively aiming at the source domain image feature and the target domain image feature based on each round of shared deep network to be trained;
carrying out semantic segmentation according to the source domain semantic features and the target domain semantic features through each round of sharing classifier to be trained to obtain a source domain segmentation result and a target domain segmentation result;
Optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained for each round according to the first loss, the second loss and the third loss to obtain the semantic segmentation model applicable to the target domain after the multi-round training is finished, wherein the semantic segmentation model comprises the following steps:
and optimizing the source domain shallow network, the target domain shallow network, the shared deep network and the shared classifier to obtain the semantic segmentation model applicable to the target domain after the multi-round training is finished.
3. The method according to claim 2, wherein the method further comprises:
carrying out semantic segmentation according to the semantic features of the target domain through each round of auxiliary classifier to be trained to obtain an auxiliary segmentation result;
determining a fourth loss according to the difference between the auxiliary segmentation result and the target domain segmentation result;
optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained for each round according to the first loss, the second loss and the third loss to obtain the semantic segmentation model applicable to the target domain after the multi-round training is finished, wherein the semantic segmentation model comprises the following steps:
And optimizing the source domain shallow network, the target domain shallow network, the shared deep network and the auxiliary classifier to obtain the semantic segmentation model applicable to the target domain after the training is finished.
4. The method of claim 1, wherein the first penalty is determined based on a discrimination of the source domain image feature; the distinguishing result of the source domain image features is obtained by inputting the source domain image features into a shallow layer distinguishing device after the last round of optimization;
the method further comprises the steps of:
after the source domain shallow network and the target domain shallow network to be trained in each round are optimized, network parameters of the source domain shallow network and the target domain shallow network after each round of optimization are fixed, and the shallow discriminator after the previous round of optimization is optimized towards the direction that the confidence of distinguishing the source domain image features extracted from the source domain shallow network after each round of optimization as true is lower and the confidence of distinguishing the target domain image features extracted from the target domain shallow network after each round of optimization as true is higher.
5. The method of claim 4, wherein the second loss is determined based on a discrimination result of the target domain segmentation result; the discrimination result of the target domain segmentation result is obtained by inputting the target domain segmentation result into a deep layer discriminator after the last round of optimization;
the method further comprises the steps of:
after the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round are optimized, network parameters of the source domain shallow network, the target domain shallow network and the shared deep network in each round after optimization are fixed, and the deep arbiter after the optimization in the previous round is optimized towards the direction that the source domain segmentation result obtained based on the shared deep network in each round is judged to be lower in true confidence and the target domain segmentation result obtained based on the shared deep network in each round is judged to be higher in true confidence.
6. The method according to claim 1, wherein optimizing the source domain shallow network, the target domain shallow network, and the shared deep network for each round of training according to the first loss, the second loss, and the third loss to obtain the semantic segmentation model applicable to the target domain after the multiple rounds of training are completed comprises:
Performing weighted fusion on the first loss, the second loss and the third loss to obtain target loss;
and optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round in the direction of reducing the target loss so as to obtain the semantic segmentation model applicable to the target domain after the multi-round training is finished.
7. The method according to any one of claims 1 to 6, wherein the semantic segmentation model applicable to the target domain comprises a target domain shallow network, a shared deep network and a shared classifier; the method further comprises the steps of:
inputting a target domain image to be processed into the semantic segmentation model applicable to the target domain, and extracting shallow image features of the target domain image to be processed through the target domain shallow network;
extracting deep semantic features corresponding to the shallow image features through the shared deep network;
and carrying out semantic segmentation on the deep semantic features through the sharing classifier to obtain an image segmentation result of the target domain image to be processed.
8. A semantic segmentation apparatus, comprising:
the feature extraction module is used for extracting the source domain image features from the source domain sample graph through the source domain shallow layer network to be trained in each round in the training process of the semantic segmentation model applicable to the target domain; extracting a target domain sample graph through each round of target domain shallow network to be trained to obtain target domain image characteristics;
The semantic segmentation module is used for carrying out semantic segmentation processing on the source domain image features and the target domain image features based on the shared deep network to be trained in each round so as to obtain a source domain segmentation result and a target domain segmentation result;
the loss determination module is used for determining a first loss according to the difference between the distribution condition of the source domain image characteristics and the distribution condition of the target domain image characteristics; determining a second loss according to the difference between the source domain segmentation result and the target domain segmentation result; determining a third loss according to the difference between the label carried by the source domain sample graph and the source domain segmentation result;
and the model optimization module is used for optimizing the source domain shallow network, the target domain shallow network and the shared deep network to be trained in each round according to the first loss, the second loss and the third loss so as to obtain the semantic segmentation model applicable to the target domain after the multi-round training is finished.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202310904970.3A 2023-07-24 2023-07-24 Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium Active CN116630630B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310904970.3A CN116630630B (en) 2023-07-24 2023-07-24 Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310904970.3A CN116630630B (en) 2023-07-24 2023-07-24 Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN116630630A true CN116630630A (en) 2023-08-22
CN116630630B CN116630630B (en) 2023-12-15

Family

ID=87602943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310904970.3A Active CN116630630B (en) 2023-07-24 2023-07-24 Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116630630B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883673A (en) * 2023-09-08 2023-10-13 腾讯科技(深圳)有限公司 Semantic segmentation model training method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111386536A (en) * 2017-10-27 2020-07-07 谷歌有限责任公司 Semantically consistent image style conversion
US20210012198A1 (en) * 2018-05-31 2021-01-14 Huawei Technologies Co., Ltd. Method for training deep neural network and apparatus
WO2021114130A1 (en) * 2019-12-11 2021-06-17 中国科学院深圳先进技术研究院 Unsupervised self-adaptive mammary gland lesion segmentation method
CN115705706A (en) * 2021-08-13 2023-02-17 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN115713037A (en) * 2022-11-24 2023-02-24 复旦大学 Bidirectional cross-modal unsupervised image segmentation domain adaptation method based on wavelet spectrum migration
CN116188478A (en) * 2023-02-03 2023-05-30 京东方科技集团股份有限公司 Image segmentation method, device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111386536A (en) * 2017-10-27 2020-07-07 谷歌有限责任公司 Semantically consistent image style conversion
US20210012198A1 (en) * 2018-05-31 2021-01-14 Huawei Technologies Co., Ltd. Method for training deep neural network and apparatus
WO2021114130A1 (en) * 2019-12-11 2021-06-17 中国科学院深圳先进技术研究院 Unsupervised self-adaptive mammary gland lesion segmentation method
CN115705706A (en) * 2021-08-13 2023-02-17 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN115713037A (en) * 2022-11-24 2023-02-24 复旦大学 Bidirectional cross-modal unsupervised image segmentation domain adaptation method based on wavelet spectrum migration
CN116188478A (en) * 2023-02-03 2023-05-30 京东方科技集团股份有限公司 Image segmentation method, device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883673A (en) * 2023-09-08 2023-10-13 腾讯科技(深圳)有限公司 Semantic segmentation model training method, device, equipment and storage medium
CN116883673B (en) * 2023-09-08 2023-12-26 腾讯科技(深圳)有限公司 Semantic segmentation model training method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN116630630B (en) 2023-12-15

Similar Documents

Publication Publication Date Title
CN111738357B (en) Junk picture identification method, device and equipment
CN110245714B (en) Image recognition method and device and electronic equipment
CN113704531A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN116630630B (en) Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium
CN112069884A (en) Violent video classification method, system and storage medium
CN111932544A (en) Tampered image detection method and device and computer readable storage medium
CN116310656B (en) Training sample determining method and device and computer equipment
WO2023024413A1 (en) Information matching method and apparatus, computer device and readable storage medium
Kim et al. Cluster and aggregate: Face recognition with large probe set
CN116805039B (en) Feature screening method, device, computer equipment and data disturbance method
CN116012841A (en) Open set image scene matching method and device based on deep learning
CN116030466A (en) Image text information identification and processing method and device and computer equipment
CN116311546A (en) Living body detection method and system
CN114819138A (en) Graph data processing method and device, electronic equipment and storage medium
CN116630629B (en) Domain adaptation-based semantic segmentation method, device, equipment and storage medium
CN117437425B (en) Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium
CN115761239B (en) Semantic segmentation method and related device
CN117786234B (en) Multimode resource recommendation method based on two-stage comparison learning
CN111582107B (en) Training method and recognition method of target re-recognition model, electronic equipment and device
CN118114123A (en) Method, device, computer equipment and storage medium for processing recognition model
CN117726994A (en) Vehicle re-identification method, apparatus, device, storage medium, and program product
CN117113182A (en) Method, device, computer equipment and storage medium for detecting data outside distribution
CN116597293A (en) Multi-mode scene recognition method, device, computer equipment and storage medium
Gao et al. Multi-scale Structure Perception and Global Context-aware Method for Small-scale Pedestrian Detection
CN117037014A (en) Object labeling method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant