WO2022142450A1 - 用于图像分割模型训练和图像分割的方法及装置 - Google Patents

用于图像分割模型训练和图像分割的方法及装置 Download PDF

Info

Publication number
WO2022142450A1
WO2022142450A1 PCT/CN2021/117037 CN2021117037W WO2022142450A1 WO 2022142450 A1 WO2022142450 A1 WO 2022142450A1 CN 2021117037 W CN2021117037 W CN 2021117037W WO 2022142450 A1 WO2022142450 A1 WO 2022142450A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature information
target
image segmentation
network
Prior art date
Application number
PCT/CN2021/117037
Other languages
English (en)
French (fr)
Inventor
申世伟
李家宏
李思则
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Priority to EP21913197.6A priority Critical patent/EP4095801A1/en
Publication of WO2022142450A1 publication Critical patent/WO2022142450A1/zh
Priority to US17/895,629 priority patent/US20230022387A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present disclosure relates to the technical field of artificial intelligence, and in particular, to an image segmentation model training, an image segmentation method, an apparatus, and an electronic device.
  • AI Artificial Intelligence
  • an image segmentation model training method including: acquiring target category feature information and associated scene feature information of the target category feature information, the target category feature information representing training samples and predictions category features of the sample; perform splicing processing on the target category feature information and the associated scene feature information to obtain the first splicing feature information; input the first splicing feature information into the initial generation network for image synthesis processing to obtain the first splicing feature information synthesizing images; inputting the first synthetic image into the initial discrimination network for authenticity discrimination, and obtaining a first image discrimination result; inputting the first synthetic image into the classification network of the initial image segmentation model for image segmentation to obtain the first image segmentation Result: The classification network of the initial image segmentation model is trained based on the first image discrimination result, the first image segmentation result and the target type feature information, and a target image segmentation model is obtained.
  • an image segmentation model training apparatus including: a feature information acquisition module configured to perform acquisition of target category feature information and associated scene feature information of the target category feature information, the The target category feature information represents the category features of the training samples and the predicted samples; the first splicing processing module is configured to perform splicing processing on the target category feature information and the associated scene feature information to obtain first splicing feature information; an image synthesis processing module configured to perform image synthesis processing by inputting the first stitching feature information into an initial generation network to obtain a first synthesized image; a first authenticity discrimination module configured to execute the first synthesis The image is input into the initial discrimination network for authenticity discrimination to obtain the first image discrimination result; the first image segmentation module is configured to perform image segmentation by inputting the first synthetic image into the classification network of the initial image segmentation model to obtain the first image segmentation result; a model training module configured to execute a classification network for training the initial image segmentation model based on the first image discrimination
  • an image segmentation method including: acquiring an image to be segmented;
  • the target image segmentation model performs image segmentation on the to-be-segmented image to obtain a target segmented image.
  • an image segmentation apparatus comprising: a to-be-segmented image acquisition module configured to perform acquisition of the to-be-segmented image; and a third image segmentation module configured to perform the to-be-segmented image acquisition module
  • the target image segmentation model trained by the image segmentation model training method according to any one of the above-mentioned first aspects is input, and image segmentation is performed on the to-be-segmented image to obtain a target segmented image.
  • an electronic device comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to achieve The method of any one of the first or third aspects above.
  • a computer-readable storage medium when instructions in the storage medium are executed by a processor of an electronic device, the electronic device can execute the first embodiment of the present disclosure.
  • a computer program product comprising instructions, which, when run on a computer, cause the computer to perform the method according to any one of the first or third aspects of the embodiments of the present disclosure .
  • the recognition ability of the trained target image segmentation model for unknown categories can be improved.
  • the scene context can be better used to adjust the classification network in zero-sample image segmentation training, and the accuracy of zero-sample segmentation can be greatly improved.
  • Fig. 1 is a schematic diagram of an application environment according to an exemplary embodiment.
  • FIG. 2 is a flowchart of a method for training an image segmentation model according to an exemplary embodiment
  • FIG. 3 is a flowchart of a method for acquiring associated scene feature information according to an exemplary embodiment
  • FIG. 4 is a flowchart of a method for pre-training an image segmentation model according to an exemplary embodiment
  • FIG. 5 is a flowchart of an image segmentation method according to an exemplary embodiment
  • FIG. 6 is a block diagram of an apparatus for training an image segmentation model according to an exemplary embodiment
  • FIG. 7 is a block diagram of an image segmentation apparatus according to an exemplary embodiment
  • FIG. 8 is a block diagram of an electronic device for image segmentation model training or image segmentation according to an exemplary embodiment
  • Fig. 9 is a block diagram of an electronic device for image segmentation model training or image segmentation according to an exemplary embodiment.
  • FIG. 1 is a schematic diagram of an application environment according to an exemplary embodiment.
  • the application environment may include a server 01 and a terminal 02 .
  • the server 01 can be used to train a target image segmentation model that can perform image segmentation.
  • the server 01 may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud service, cloud database, cloud computing, cloud function, cloud Cloud servers for basic cloud computing services such as storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • the terminal 02 may perform image segmentation processing in combination with the image segmentation model trained by the server 01 .
  • the terminal 02 may include, but is not limited to, a smartphone, a desktop computer, a tablet computer, a laptop computer, a smart speaker, a digital assistant, augmented reality (AR)/virtual reality (VR) Devices, smart wearable devices and other types of electronic devices.
  • the operating system running on the electronic device may include, but is not limited to, the Android system, the IOS system, linux, windows, and the like.
  • FIG. 1 is only an application environment provided by the present disclosure. In practical applications, other application environments may also be included, such as the training of the target image segmentation model, which may also be implemented on the terminal 02 .
  • the above-mentioned server 01 and the terminal 02 may be directly or indirectly connected through wired or wireless communication, which is not limited in this disclosure.
  • Fig. 2 is a flowchart of an image segmentation model training method according to an exemplary embodiment. As shown in Fig. 2, the image segmentation model training method can be applied to electronic devices such as servers, terminals, and edge computing nodes, including the following step.
  • step S201 the target category feature information and the associated scene feature information of the target category feature information are acquired.
  • the target category feature information may represent the category features of the training samples and the prediction samples; in one embodiment, the category features of the training samples may be a large number of known category features, that is, used for training the target image segmentation model
  • the category features of the training samples; the category features of the predicted samples are a large number of unknown category features, that is, the category features of images that do not participate in the training of the target image segmentation model; correspondingly, the training samples can include a large number of training images used for training the target image segmentation model ;
  • Predicted samples may include a large number of images that have not participated in the training of the target image segmentation model and are segmentable (need to be predicted) by the trained target image segmentation model, that is, zero samples.
  • acquiring target category feature information includes: acquiring category information of training samples and prediction samples; inputting category information into a target word vector model to obtain target category feature information.
  • the category information of the images to be segmented by the target image segmentation model in practical applications can be obtained as the category information of the prediction samples according to the actual application requirements.
  • the category information may be the category of the segmentation object included in a large number of images (ie, training samples or prediction samples).
  • an image includes a cat (a segmentation object).
  • the category information of the image is a cat. .
  • the target word vector model may be obtained by training a preset word vector model based on preset training text information.
  • the preset training text information may be text information related to the application field of the target image segmentation model.
  • the preset training text information may be subjected to word segmentation processing, and the word segmentation information (ie each word) after the word segmentation processing is input into the target word vector model for training,
  • each word can be mapped into a K-dimensional real number vector, and the target word vector model can be obtained and the word vector set representing the semantic relevance between words can be obtained.
  • the category information (word) is input into the target word vector model, and the target word vector model can determine the word vector of the category information based on the word vectors in the word vector set, and input the word vector of the category information The word vector is used as the target category feature information corresponding to the category information.
  • the preset word vector model may include, but is not limited to, word vector models such as word2vec, fasttext, and glove.
  • the recognition ability of the trained target image segmentation model for unknown categories can be improved, thereby greatly improving the segmentation accuracy.
  • FIG. 3 is a flowchart of a method for acquiring associated scene feature information according to an exemplary embodiment. In the embodiment of the present disclosure, the following steps may be included.
  • step S301 a scene image set is acquired.
  • step S303 the scene image set is input into the scene recognition model for scene recognition, and the scene information set is obtained.
  • step S305 the scene information set is input into the target word vector model to obtain the scene feature information set.
  • step S307 the similarity between the target category feature information and the scene feature information in the scene feature information set is calculated.
  • step S309 the associated scene feature information is determined from the scene feature information set based on the similarity.
  • the set of scene images may include images corresponding to a large number of scenes.
  • the scene information set may be scene information corresponding to a large number of images in the scene image set, for example, an image taken in a bedroom, the scene information is a bedroom; an image of a fish in a pond is taken, and the scene information may be a pond.
  • an image with scene annotations can be used as training data, and a preset deep learning model can be trained to obtain a scene recognition model capable of performing scene recognition.
  • the scene image set is input into the scene recognition model for scene recognition, and the scene information set corresponding to the images in the scene image set can be obtained.
  • the preset deep learning model may include, but is not limited to, deep learning models such as convolutional neural networks, logistic regression neural networks, and recurrent neural networks.
  • the scene information (word) in the scene information set is input into the target word vector model, and the target word vector model can determine the word vector of the scene information based on the word vector in the word vector set, and the word vector of the scene information As the scene feature information corresponding to the scene information.
  • the target word vector model used to obtain the scene feature information set and the target word vector model used by the user to obtain the target category feature information are the same word vector model, that is, they are obtained by training based on the same preset training text information.
  • the target word vector model can improve the accuracy of representing the semantic correlation between scene information and category information.
  • the similarity between the target category feature information and the scene feature information may represent the semantic similarity between the words (category information and scene information) corresponding to the target category feature information and the scene feature information; in the implementation of the present disclosure
  • the higher the similarity between the target category feature information and the scene feature information the higher the semantic similarity between the words corresponding to the target category feature information and the scene feature information; conversely, the higher the similarity between the target category feature information and the scene feature information is.
  • the lower the similarity the lower the semantic similarity between the words corresponding to the target category feature information and the scene feature information.
  • the similarity between the target category feature information and the scene feature information may include, but is not limited to, the cosine distance, the Euclidean distance, and the Manhattan distance between the target category feature information and the scene feature information.
  • the above-mentioned target category feature information may include category feature information (word vectors) corresponding to multiple categories of information.
  • category feature information word vectors
  • the similarity between category feature information corresponding to the category information may be selected.
  • the scene feature information of the top N in the ranking is used as the primary selection scene feature information, and one scene feature information is randomly selected from the primary selection scene feature information as the associated scene feature information of the category of feature information.
  • the scene feature information whose similarity between the category feature information corresponding to the category information is greater than or equal to a preset threshold may also be selected as the primary selection scene feature information, and the primary selection scene is selected from the scene feature information.
  • One scene feature information is randomly selected from the feature information as the associated scene feature information of the category feature information.
  • the above-mentioned preset threshold and N may be set according to actual application requirements.
  • the relevant scene feature information of the target category feature information by obtaining the relevant scene feature information of the target category feature information, it is possible to realize the prediction of the scene appearing in a certain category of segmentation objects, thereby ensuring that the subsequent automatic synthesis of picture pixel features based on word vectors of unknown categories or known categories is achieved. , which can increase the limit of scenes that appear in this category, so that the training of the image segmentation model is more focused on the synthesis of image pixel features in a specific scene.
  • step S203 splicing processing is performed on the target category feature information and the associated scene feature information to obtain first splicing feature information.
  • the splicing processing of the target category feature information and the associated scene feature information may include stitching the category feature information corresponding to each category information in the target category feature information and the associated scene feature information of the category feature information.
  • the category feature information corresponding to a certain category information is [1, 2, 3, 4, 5]; the associated scene feature information of the category feature information is [6, 7, 8, 9, 0].
  • the first splicing feature information corresponding to the category information may be [1, 2, 3, 4, 5, 6, 7, 8, 9, 0], or [6, 7, 8, 9, 0] , 1, 2, 3, 4, 5].
  • pre-training in order to improve the accuracy of feature extraction during the zero-sample learning process, pre-training can be performed in combination with training samples, training scene feature information of the training samples, and training category feature information of the training samples.
  • the above method may further include the following steps.
  • step S401 a training sample, training scene feature information of the training sample, and training category feature information of the training sample are acquired.
  • the training scene feature information may be a word vector of the scene information corresponding to the training sample; in one embodiment, for the specific refinement steps of acquiring the training scene feature information of the training sample, refer to the above-mentioned acquisition of the scene image set The specific refinement steps of the scene feature information set are not repeated here.
  • the training category feature information of the training sample may be a word vector of the category information corresponding to the training sample.
  • step S403 the training samples are input into the feature extraction network of the segmentation model to be trained for feature extraction to obtain segmentation feature images.
  • the segmentation model to be trained may include DeepLab (semantic image segmentation model), but the embodiments of the present disclosure are not limited to the above, and may also include other deep learning models in practical applications.
  • the segmentation model to be trained may include a feature extraction network and a classification network.
  • a feature extraction network can be used to extract feature information of images (training samples), and the training samples are input into the feature extraction network of the segmentation model to be trained for feature extraction, and a segmented feature image can be obtained.
  • step S405 splicing processing is performed on the training category feature information and the training scene feature information to obtain second splicing feature information.
  • the splicing process is performed on the training category feature information and the training scene feature information to obtain the second splicing feature information.
  • the splicing process is performed on the training category feature information and the training scene feature information to obtain the second splicing feature information.
  • step S407 the second stitching feature information is input into the generating network to be trained to perform image synthesis processing to obtain a second synthesized image.
  • the generation network to be trained may be a generator in GAN (Generative Adversarial Networks, Generative Adversarial Networks).
  • GAN Geneative Adversarial Networks, Generative Adversarial Networks.
  • the second splicing feature information obtained after the splicing process is performed using the training category feature information of the training sample and the training-related scene feature information to synthesize the synthesized image corresponding to the training sample, the segmentation objects corresponding to the training sample can be added. Due to the limitation of the scene, a second composite image that can accurately represent the segmented object category information and scene information is obtained, which greatly improves the feature mapping capability of the training samples.
  • step S409 the second composite image and the segmentation feature image are input into the classification network of the segmentation model to be trained, and image segmentation is performed respectively to obtain a second image segmentation result corresponding to the second composite image and a third image segmentation corresponding to the segmentation feature image. result.
  • the second synthetic image may include a synthetic image corresponding to each training image in the training sample, and correspondingly, the second image segmentation result corresponding to each synthetic image here may represent the predicted category feature of the synthetic image
  • the segmentation feature image may include image feature information corresponding to each training image in the training sample;
  • the third image segmentation result corresponding to each image feature information here may represent the image feature information The predicted category feature information.
  • step S411 the segmentation feature image and the second composite image are input into the discrimination network to be trained, and authenticity determination is performed respectively to obtain the second image discrimination result corresponding to the segmented characteristic image and the third image discrimination result corresponding to the second composite image.
  • the discriminant network to be trained may be a discriminator in a GAN.
  • the second image discrimination result corresponding to the segmented feature image may represent the predicted probability that the segmented feature image is a real image; the third image discrimination result corresponding to the second composite image may represent that the second composite image is a real image predicted probability.
  • the real image may be a non-synthesized image.
  • step S413 the to-be-trained segmentation model, the to-be-trained segmentation model, and the to-be-trained segmentation model are trained based on the second composite image, the segmentation feature image, the second image segmentation result, the third image segmentation result, the training type feature information, the second image discrimination result, and the third image discrimination result.
  • the segmentation model to be trained is trained based on the second composite image, the segmentation feature image, the second image segmentation result, the third image segmentation result, the training type feature information, the second image discrimination result and the third image discrimination result.
  • the generation network to be trained and the discrimination network to be trained, and obtaining the initial image segmentation model, the initial generation network and the initial discrimination network may include: using the second composite image and the segmentation feature image to calculate the content loss; using the second image segmentation result, the third image The segmentation result and the training type feature information are used to calculate the second segmentation loss; the second discrimination loss is calculated using the second image discrimination result and the third image discrimination result; the second target is determined according to the content loss, the second discrimination loss and the second segmentation loss loss; if the second target loss does not meet the second preset condition, update the network parameters in the segmentation model to be trained, the generation network to be trained, and the discrimination network to be trained; based on the updated segmentation model to be trained, the generation network to
  • the content loss may reflect the difference between the second synthetic image generated by the generation network to be trained and the segmentation feature map.
  • the content loss may be the similarity distance between the second synthetic image corresponding to the training image in the training sample and the segmentation feature image.
  • the similarity distance between the second composite image and the segmented feature image may include, but is not limited to, cosine distance, Euclidean distance, and Manhattan distance between the second composite image and the segmented feature image.
  • the value of the content loss is proportional to the difference between the second synthetic image and the segmentation feature map. Correspondingly, the smaller the value of the content loss, the higher the performance of the initial generation network obtained by training.
  • calculating the second segmentation loss may include calculating a first difference between the second image segmentation result and the training type feature information based on a preset loss function A segmentation sub-loss, and a second segmentation sub-loss between the third image segmentation result and the training type feature information is calculated, and the first segmentation sub-loss and the second segmentation sub-loss are weighted to obtain the above-mentioned second segmentation loss.
  • the weights of the first split sub-loss and the second split sub-loss can be set according to actual application requirements.
  • the first segmentation sub-loss may represent the difference between each pixel point of the second synthetic image and each pixel point of the training type feature information; the second segmentation sub-loss may represent each pixel point of the segmentation feature image The difference between each pixel with the training type feature information.
  • calculating the second discrimination loss using the second image discrimination result and the third image discrimination result may include calculating the first discrimination between the second image discrimination result and the authenticity label corresponding to the segmentation feature image based on a preset loss function sub-loss, and calculating the second discriminant sub-loss between the third image discrimination result and the authenticity label corresponding to the second synthetic image.
  • the first discriminant sub-loss and the second discriminant sub-loss are weighted to obtain the above-mentioned second discriminant loss.
  • the weights of the first discriminant sub-loss and the second discriminant sub-loss can be set according to actual application requirements.
  • the first discriminant sub-loss may represent the difference between the second image discrimination result and the authenticity label corresponding to the segmentation feature image; the second discriminant sub-loss may represent the third image discrimination result and the second composite image The difference between the corresponding authenticity labels.
  • the authenticity label corresponding to the segmented feature image may be 1 (1 represents a real image); since the second synthetic image is a synthetic image, not a real image; the corresponding , the authenticity label corresponding to the second synthetic image may be 0 (0 represents a non-real image, that is, a synthetic image).
  • the preset loss function may include, but is not limited to, a cross-entropy loss function, a logistic loss function, a Hinge (hinge) loss function, an exponential loss function, etc., and the embodiment of the present disclosure is not limited to the above.
  • the loss functions used to calculate the discriminative loss and segmentation loss can be the same or different.
  • the content loss, the second segmentation loss, and the second discriminant loss may be weighted to obtain the second target loss.
  • the weights of the content loss, the second segmentation loss, and the second discrimination loss may be set according to actual application requirements.
  • the second target loss meeting the second preset condition may be that the second target loss is less than or equal to a specified threshold, or the second target loss corresponding to the two training processes before and after and the second target loss corresponding to the previous training and learning The difference between target losses is less than a certain threshold.
  • the specified threshold and a certain threshold may be set in combination with actual training requirements.
  • updating the second target loss based on the updated segmentation model to be trained, the generation network to be trained, and the discrimination network to be trained may include randomly selecting part of the training samples from the training samples, the training category feature information and the training scene of the training samples. feature information, and repeat the steps of determining the second target loss in the above steps S403-S413 in combination with the updated segmentation model to be trained, the generation network to be trained, and the discrimination network to be trained.
  • the restriction of scene information corresponding to the segmentation objects of each category is increased, so that the training of the image segmentation model is more focused on the synthesis of image pixel features in a specific scene, which greatly improves the feature mapping of training samples.
  • the second target loss is determined by combining the content loss, the second segmentation loss and the second discriminant loss, which can improve the similarity between the synthetic image generated by the trained initial generation network and the real sample, thereby improving the trained initial The segmentation accuracy of the image segmentation model.
  • step S205 the first stitching feature information is input into the initial generation network to perform image synthesis processing to obtain a first synthesized image.
  • the initial generation network may be obtained after pre-training the generator in the GAN based on the training category feature information of the training samples and the training scene feature information of the training samples.
  • the first stitching feature information is input into the initial generation network to perform image synthesis processing to obtain the first synthesized image.
  • the obtained first stitching feature information is used to synthesize the image corresponding to the category information, and the scene corresponding to the segmentation object of the category information can be added.
  • the first synthetic image that can accurately represent the segmented object category information and scene information can be obtained, which greatly improves the ability of feature mapping for unknown categories.
  • step S207 the first synthetic image is input into the initial discrimination network for authenticity discrimination, and a first image discrimination result is obtained.
  • the initial discriminant network may be obtained after pre-training the discriminator in the GAN based on the training samples, the training category feature information of the training samples, and the training scene feature information of the training samples.
  • the first synthetic image may include a synthetic image corresponding to each training image in the training sample or each image in the prediction sample, and correspondingly, the first image discrimination result of each synthetic image here may represent the The predicted probability of whether the synthetic image is a real training image or whether it is a real prediction sample.
  • step S209 the first composite image is input into the classification network of the initial image segmentation model to perform image segmentation, and a first image segmentation result is obtained.
  • the initial image segmentation model is obtained by pre-training the segmentation model to be trained based on the training samples, the training scene feature information of the training samples, and the training category feature information of the training samples.
  • the first synthetic image is input into the classification network of the initial image segmentation model to perform image segmentation, and the first image segmentation result can be obtained.
  • the first image segmentation result corresponding to the first composite image may represent the predicted category feature information of the first composite image.
  • step S211 the classification network of the initial image segmentation model is trained based on the first image discrimination result, the first image segmentation result and the target type feature information, to obtain the target image segmentation model.
  • training a classification network for an initial image segmentation model based on the first image discrimination result, the first image segmentation result and the target type feature information, and obtaining the target image segmentation model may include: using the first image discrimination result and the first composite The authenticity label of the image is used to calculate the first discrimination loss; the first image segmentation result and the target type feature information are used to calculate the first segmentation loss; according to the first discrimination loss and the first segmentation loss, the first target loss is determined; If the first preset condition is not met, update the network parameters in the classification network, initial generation network and initial discrimination network of the initial image segmentation model; based on the classification network, initial generation network and initial discrimination network of the updated initial image segmentation model The first target loss is updated until the first target loss satisfies the first preset condition, and the current initial image segmentation model is used as the target image segmentation model.
  • calculating the first discrimination loss using the first image discrimination result and the authenticity label of the first composite image may include calculating the difference between the first image discrimination result and the authenticity label of the first composite image based on a preset loss function The discriminant loss is used as the first discriminant loss.
  • the first discrimination loss may represent the difference between the first image discrimination result and the authenticity label corresponding to the first synthetic image.
  • the authenticity label corresponding to the first synthetic image may be 0 (0 represents a non-real image, that is, a synthetic image).
  • calculating the first segmentation loss using the first image segmentation result and the target type feature information may include calculating a segmentation loss between the first image segmentation result and the target type feature information based on a preset loss function, where the segmentation loss is taken as The above first segmentation loss.
  • the first segmentation loss can represent the difference between each pixel of a composite image and each pixel of the target type feature information.
  • the above-mentioned preset loss function may include, but is not limited to, a cross-entropy loss function, a logistic loss function, a Hinge (hinge) loss function, an exponential loss function, and the like, and the embodiment of the present disclosure is not limited to the above-mentioned loss function.
  • the loss functions used to calculate the discriminative loss and segmentation loss can be the same or different.
  • the first segmentation loss and the first discrimination loss may be weighted to obtain the first target loss.
  • the weights of the first segmentation loss and the first discrimination loss may be set according to actual application requirements.
  • the first target loss meeting the first preset condition may be that the input first target loss is less than or equal to a specified threshold, or the first target loss corresponding to the two training processes before and after and the first target loss corresponding to the previous training and learning.
  • the difference between a target loss is less than a certain threshold.
  • the specified threshold and a certain threshold may be set in combination with actual training requirements.
  • part of the target category feature information and the associated scene feature information of the target category feature information will be randomly selected from the target category feature information each time to participate in this training.
  • the unknown category features are randomly generated with a relatively large probability
  • the known category features are randomly generated with a relatively small probability.
  • the specific refinement of updating the first target loss based on the classification network, the initial generation network and the initial discriminant network based on the updated initial image segmentation model can refer to the above-mentioned based on the updated segmentation model to be trained, the generation network to be trained and the discrimination network to be trained. The relevant refinement steps for updating the second target loss will not be repeated here.
  • the first target is determined by combining the first image segmentation result and the first segmentation loss determined by the target type feature information, and the first image discrimination result and the second discrimination loss determined by the authenticity label of the first composite image.
  • the classification network of the initial image segmentation model can be better trained, and the zero-sample segmentation can be greatly improved. accuracy.
  • the trained target image segmentation model can be improved to unknown unknowns.
  • the recognition ability of the category, and by obtaining the relevant scene feature information of the target category feature information, the prediction of the scene of a certain category of segmentation objects can be realized, thereby ensuring that the image pixel features are automatically synthesized based on word vectors of unknown categories or known categories.
  • FIG. 5 is a flowchart of an image segmentation method according to an exemplary embodiment.
  • the The method can be applied to electronic devices such as servers, terminals, edge computing nodes, etc., and includes the following steps.
  • step S501 an image to be segmented is acquired.
  • step S503 input the image to be segmented into the target image segmentation model trained by the above-mentioned image segmentation model training method, and perform image segmentation on the image to be segmented to obtain the target segmented image.
  • the image to be segmented may be an image to be segmented, and in the embodiment of the present disclosure, the image to be segmented may include a target segmentation object.
  • the target segmented image may be an image of the region where the target segmented object is located in the image to be segmented.
  • the classifier in the target image segmentation model can be better adjusted, the ability of the model feature mapping can be improved, and then the image segmentation is performed based on the target image segmentation model. can greatly improve the segmentation accuracy and reduce the error rate.
  • Fig. 6 is a block diagram of an apparatus for training an image segmentation model according to an exemplary embodiment.
  • the apparatus includes: a feature information acquisition module 610 configured to perform acquisition of target category feature information and associated scene feature information of the target category feature information, the target category feature information representing the category features of the training samples and the predicted samples; first The stitching processing module 620 is configured to perform stitching processing on the target category feature information and the associated scene feature information to obtain the first stitching feature information; the first image synthesis processing module 630 is configured to perform inputting the first stitching feature information into the initial stitching feature information.
  • the generation network performs image synthesis processing to obtain a first synthesized image
  • the first authenticity discrimination module 640 is configured to perform authenticity discrimination by inputting the first synthesized image into the initial discrimination network to obtain a first image discrimination result
  • the first image segmentation The module 650 is configured to perform image segmentation by inputting the first synthetic image into the classification network of the initial image segmentation model to obtain the first image segmentation result
  • the model training module 660 is configured to perform the first image based on the discrimination result, the first image
  • the segmentation results and target type feature information train the classification network of the initial image segmentation model to obtain the target image segmentation model.
  • the feature information acquisition module 610 includes: a scene image set acquisition unit, configured to perform acquisition of a scene image set; a scene recognition unit, configured to perform scene recognition by inputting the scene image set into a scene recognition model, obtaining a scene information set; a scene feature information set acquiring unit, configured to execute inputting the scene information set into the target word vector model, to obtain a scene feature information set; a similarity calculating unit, configured to execute calculating the target category feature information and the scene feature information The similarity between the scene feature information is concentrated; the associated scene feature information determining unit is configured to determine the associated scene feature information from the scene feature information set based on the similarity.
  • the feature information acquisition module 610 includes: a category information acquisition unit configured to perform acquisition of category information of training samples and prediction samples; a target category feature information acquisition unit configured to perform input category information into the target word vector model to obtain the target category feature information.
  • the model training module 660 includes: a first discriminative loss calculation unit configured to perform calculation of a first discriminative loss using the first image discrimination result and the authenticity label of the first synthetic image; a first segmentation loss a computing unit configured to perform computing a first segmentation loss using the first image segmentation result and target type feature information; a first target loss determination unit configured to perform determining a first target based on the first discrimination loss and the first segmentation loss loss; a first network parameter update unit, configured to update the network parameters in the classification network, the initial generation network and the initial discrimination network of the initial image segmentation model when the first target loss does not meet the first preset condition; The target image segmentation model determination unit is configured to execute the classification network based on the updated initial image segmentation model, the initial generation network and the initial discrimination network to update the first target loss, until the first target loss satisfies the first preset condition, and the current The initial image segmentation model serves as the target image segmentation model.
  • the above-mentioned apparatus further includes: a data acquisition module configured to perform acquisition of training samples, training scene feature information of the training samples, and training category feature information of the training samples; a feature extraction module configured to perform The training samples are input into the feature extraction network of the segmentation model to be trained to perform feature extraction, and a segmentation feature image is obtained; the second splicing processing module is configured to perform splicing processing of the training category feature information and the training scene feature information to obtain the second splicing feature information;
  • the second image synthesis processing module is configured to perform image synthesis processing by inputting the second stitching feature information into the generating network to be trained to obtain a second composite image; the second image segmentation module is configured to perform the second composite image and segmentation
  • the feature image is input into the classification network of the segmentation model to be trained, and image segmentation is performed respectively to obtain a second image segmentation result corresponding to the second composite image and a third image segmentation result corresponding to the segmented feature image; the second authenticity discrimination module is
  • the initial model training module includes: a content loss calculation unit configured to perform calculation of content loss using the second synthetic image and the segmentation feature image; and a second segmentation loss calculation unit configured to perform calculation using the second The image segmentation result, the third image segmentation result and the training type feature information, calculate the second segmentation loss;
  • the second discriminant loss calculation unit is configured to calculate the second discriminant loss using the second image discrimination result and the third image discrimination result;
  • the second target loss determination unit is configured to perform determining the second target loss according to the content loss, the second discriminant loss and the second segmentation loss;
  • the network parameters in the segmentation model to be trained, the generation network to be trained and the discrimination network to be trained are updated;
  • the initial model determination unit is configured to execute the segmentation model to be trained after the update, the generation network to be trained and the network parameters to be trained.
  • the discrimination network to be trained updates the second target loss until the second target loss meets the second preset condition, the current segmentation model to be trained is used as the initial image segmentation model, the current generation network to be trained is used as the initial generation network, and the current segmentation model to be trained is used as the initial generation network.
  • the discriminant network to be trained is used as the initial discriminant network.
  • Fig. 7 is a block diagram of an image segmentation apparatus according to an exemplary embodiment.
  • the device includes: an image acquisition module 710 to be segmented, configured to acquire an image to be segmented; a third image segmentation module 720, configured to perform inputting the image to be segmented into the target obtained by training the above image segmentation model training method
  • the image segmentation model performs image segmentation on the image to be segmented to obtain the target segmented image.
  • FIG. 8 is a block diagram of an electronic device for image segmentation model training or image segmentation according to an exemplary embodiment.
  • the electronic device may be a terminal, and its internal structure diagram may be as shown in FIG. 8 .
  • the electronic device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Among them, the processor of the electronic device is used to provide computing and control capabilities.
  • the memory of the electronic device includes a non-volatile storage medium and an internal memory.
  • the nonvolatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the electronic device is used to communicate with an external terminal through a network connection.
  • the computer program when executed by a processor, implements a method of image defect filling network determination or image defect processing.
  • the display screen of the electronic device may be a liquid crystal display screen or an electronic ink display screen
  • the input device of the electronic device may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the electronic device , or an external keyboard, trackpad, or mouse.
  • FIG. 9 is a block diagram of an electronic device for image segmentation model training or image segmentation according to an exemplary embodiment.
  • the electronic device may be a server, and its internal structure diagram may be as shown in FIG. 9 .
  • the electronic device includes a processor, memory, and a network interface connected by a system bus. Among them, the processor of the electronic device is used to provide computing and control capabilities.
  • the memory of the electronic device includes a non-volatile storage medium and an internal memory.
  • the nonvolatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the electronic device is used to communicate with an external terminal through a network connection.
  • the computer program when executed by a processor, implements an image segmentation model training or image segmentation method.
  • FIG. 8 and FIG. 9 are only block diagrams of partial structures related to the solution of the present disclosure, and do not constitute a limitation on the electronic equipment to which the solution of the present disclosure is applied.
  • An electronic device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • an electronic device comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the present disclosure
  • Example image segmentation model training or image segmentation method comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the present disclosure
  • a storage medium is also provided, when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the image segmentation model training or the image segmentation method in the embodiment of the present disclosure .
  • a computer program product comprising instructions that, when executed on a computer, cause the computer to perform the image segmentation model training or the image segmentation method in the embodiments of the present disclosure.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

一种图像分割模型训练、图像分割方法、装置及电子设备。该图像分割模型训练方法包括获取表征训练样本和预测样本的类别特征的目标类别特征信息和其关联场景特征信息;对目标类别特征信息和关联场景特征信息进行拼接处理;将拼接处理得到的第一拼接特征信息输入初始生成网络进行图像合成处理;将合成处理得到的第一合成图像输入初始判别网络进行真实性判别;将第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;基于第一图像判别结果、第一图像分割结果和目标类型特征信息训练初始图像分割模型的分类网络,得到目标图像分割模型。

Description

用于图像分割模型训练和图像分割的方法及装置
相关申请的交叉引用
本申请基于申请号为202011574785.5、申请日为2020年12月28日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及人工智能技术领域,尤其涉及一种图像分割模型训练、图像分割方法、装置及电子设备。
背景技术
人工智能(Artificial Intelligence,AI)技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。其中,利用人工智能技术进行图像分割,在视频监控、公共安全等多个领域发挥着重要的作用。
相关技术中,由于构建训练样本的成本高,难度大,基于未知类别的词向量自动合成图片像素特征的零样本分割技术方案在业界大受欢迎。
发明内容
根据本公开实施例的第一方面,提供一种图像分割模型训练方法,包括:获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表征训练样本和预测样本的类别特征;对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
根据本公开实施例的第二方面,提供一种图像分割模型训练装置,包括:特征信息获取模块,被配置为执行获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表征训练样本和预测样本的类别特征;第一拼接处理模块,被配置为执行对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;第一图像合成处理模块,被配置为执行将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;第一真实性判别模块,被配置为执行将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;第一图像分割模块,被配置为执行将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;模型训练模块,被配置为执行基于所述第一图像判别结果、所 述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
根据本公开实施例的第三方面,提供一种图像分割方法,包括:获取待分割图像;将所述待分割图像输入上述第一方面中任一项所述的图像分割模型训练方法训练得到的目标图像分割模型,对所述待分割图像进行图像分割,得到目标分割图像。
根据本公开实施例的第四方面,提供一种图像分割装置,包括:待分割图像获取模块,被配置为执行获取待分割图像;第三图像分割模块,被配置为执行将所述待分割图像输入上述第一方面中任一项所述的图像分割模型训练方法训练得到的目标图像分割模型,对所述待分割图像进行图像分割,得到目标分割图像。
根据本公开实施例的第五方面,提供一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现如上述第一方面或第三方面中任一项所述的方法。
根据本公开实施例的第六方面,提供一种计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行本公开实施例的第一方面或第三方面中任一所述的方法。
根据本公开实施例的第七方面,提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行本公开实施例的第一方面或第三方面中任一所述的方法。
通过获取训练样本和预测样本对应类别特征来作为初始图像分割模型的训练数据,可以提高训练好的目标图像分割模型对未知类别的识别能力,且通过获取目标类别特征信息的关联场景特征信息,可以实现对某一类别分割对象所出现场景的预测,进而保证基于未知类别或已知类别的词向量自动合成图片像素特征时,可以增加类别所出现的场景的限制,使得图像分割模型的训练更专注于特定场景下图像像素特征的合成,从而可以更好的利用场景上下文来调整零样本图像分割训练中的分类网络,大大提升零样本分割的精度。
附图说明
图1是根据一示例性实施例示出的一种应用环境的示意图。
图2是根据一示例性实施例示出的一种图像分割模型训练方法的流程图;
图3是根据一示例性实施例示出的一种获取关联场景特征信息方法的流程图;
图4是根据一示例性实施例示出的一种图像分割模型预训练方法的流程图;
图5是根据一示例性实施例示出的一种图像分割方法的流程图;
图6是根据一示例性实施例示出的一种图像分割模型训练装置框图;
图7是根据一示例性实施例示出的一种图像分割装置框图;
图8是根据一示例性实施例示出的一种用于图像分割模型训练或用于图像分割的电子设备的框图;
图9是根据一示例性实施例示出的一种用于图像分割模型训练或用于图像分割的电子设备的框 图。
具体实施方式
为了使本领域普通人员更好地理解本公开的技术方案,下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
请参阅图1,图1是根据一示例性实施例示出的一种应用环境的示意图,如图1所示,该应用环境可以包括服务器01和终端02。
在一个实施例中,服务器01可以用于训练可以进行图像分割的目标图像分割模型。在本公开的实施例中,服务器01可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。
在一个实施例中,终端02可以结合服务器01训练出的图像分割模型进行图像分割处理。在本公开的实施例中,终端02可以包括但不限于智能手机、台式计算机、平板电脑、笔记本电脑、智能音箱、数字助理、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、智能可穿戴设备等类型的电子设备。在本公开的实施例中,电子设备上运行的操作系统可以包括但不限于安卓系统、IOS系统、linux、windows等。
此外,需要说明的是,图1所示的仅仅是本公开提供的一种应用环境,在实际应用中,还可以包括其他应用环境,例如目标图像分割模型的训练,也可以在终端02上实现。
在本公开的实施例中,上述服务器01以及终端02可以通过有线或无线通信方式进行直接或间接地连接,本公开在此不做限制。
图2是根据一示例性实施例示出的一种图像分割模型训练方法的流程图,如图2所示,图像分割模型训练方法可以应用于服务器、终端、边缘计算节点等电子设备中,包括以下步骤。
在步骤S201中,获取目标类别特征信息和目标类别特征信息的关联场景特征信息。
在本公开的实施例中,目标类别特征信息可以表征训练样本和预测样本的类别特征;在一个实施例中,训练样本的类别特征可以为大量已知类别特征,即用于训练目标图像分割模型的训练样本的类别特征;预测样本的类别特征为大量未知类别特征,即未参与目标图像分割模型训练的图像的类别特征;相应的,训练样本可以包括大量用于训练目标图像分割模型的训练图像;预测样本可以包括大量未参与目标图像分割模型训练,且属于训练好的目标图像分割模型可分割(需要预测)的图像,即零样本。
在一个实施例中,获取目标类别特征信息包括:获取训练样本和预测样本的类别信息;将类别信息输入目标词向量模型,得到目标类别特征信息。
在本公开的实施例中,在训练时,虽然没有获取预测样本,但可以结合实际应用需求,获取实际应用中目标图像分割模型需要分割的图像的类别信息作为预测样本的类别信息。
在一个实施例中,类别信息可以为大量图像(即训练样本或预测样本)中包含的分割对象的类别,例如一张图像中包括猫(分割对象),相应的,该图像的类别信息为猫。
在一个实施例中,目标词向量模型可以为基于预设训练文本信息对预设词向量模型进行训练得到的。在一个实施例中,预设训练文本信息可以为与目标图像分割模型的应用领域相关的文本信息。
在本公开的实施例中,在进行目标词向量模型训练过程中,可以将预设训练文本信息进行分词处理,将分词处理后的分词信息(即一个个词)输入目标词向量模型进行训练,在训练过程中可以将每个词映射成K维实数向量,得到目标词向量模型的同时可以得到表征词之间的语义关联度的词向量集合。在本公开的实施例中,后续,将类别信息(词)输入该目标词向量模型,该目标词向量模型可以基于词向量集合中的词向量确定类别信息的词向量,并将该类别信息的词向量作为类别信息对应的目标类别特征信息。
在一个实施例中,预设词向量模型可以包括但不限于word2vec、fasttext、glove等词向量模型。
上述实施例中,通过获取训练样本和预测样本对应类别特征来作为初始图像分割模型的训练数据,可以提高训练好的目标图像分割模型对未知类别的识别能力,进而大大提升分割精度。
在一个实施例中,如图3所示,图3是根据一示例性实施例示出的一种获取关联场景特征信息方法的流程图,在本公开的实施例中,可以包括如下步骤。
在步骤S301中,获取场景图像集。
在步骤S303中,将场景图像集输入场景识别模型进行场景识别,得到场景信息集。
在步骤S305中,将场景信息集输入目标词向量模型,得到场景特征信息集。
在步骤S307中,计算目标类别特征信息与场景特征信息集中场景特征信息间的相似度。
在步骤S309中,基于相似度从场景特征信息集中确定关联场景特征信息。
在一个实施例中,场景图像集可以包括大量场景对应的图像。场景信息集可以为场景图像集中大量图像对应的场景信息,例如在卧室拍摄的图像,场景信息为卧室;拍摄池塘里的鱼的图像,场景信息可以为池塘。
在一个实施例中,可以以具有场景标注的图像为训练数据,对预设深度学习模型进行训练,得到可以进行场景识别的场景识别模型。相应的,将场景图像集输入该场景识别模型进行场景识别,可以得到场景图像集中图像对应的场景信息集。
在本公开的实施例中,预设深度学习模型可以包括但不限于卷积神经网络、逻辑回归神经网络、递归神经网络等深度学习模型。
在一个实施例中,将场景信息集中场景信息(词)输入目标词向量模型,该目标词向量模型可以基于词向量集合中的词向量确定场景信息的词向量,并将该场景信息的词向量作为场景信息对应的场 景特征信息。
在一个实施例中,上述用于获取场景特征信息集的目标词向量模型与用户获取目标类别特征信息的目标词向量模型为相同的词向量模型,即为基于相同的预设训练文本信息训练得到的目标词向量模型,进而可以提高表征场景信息和类别信息之间的语义关联度的准确性。
在一个实施例中,目标类别特征信息与场景特征信息间的相似度可以表征目标类别特征信息与场景特征信息对应的词(类别信息和场景信息)之间语义的相似程度;在本公开的实施例中,目标类别特征信息与场景特征信息间的相似度越高,目标类别特征信息与场景特征信息对应的词之间语义的相似程度越高;反之,目标类别特征信息与场景特征信息间的相似度越低,目标类别特征信息与场景特征信息对应的词之间语义的相似程度越低。
在一个实施例中,目标类别特征信息与场景特征信息间的相似度可以包括但不限于目标类别特征信息与场景特征信息间的余弦距离、欧式距离、曼哈顿距离。
在一个实施例中,上述目标类别特征信息可以包括多个类别信息对应的类别特征信息(词向量),相应的,针对每一类别信息,可以选取与该类别信息对应的类别特征信息间相似度排序前N的场景特征信息作为初选场景特征信息,并从初选场景特征信息中随机选取一个场景特征信息作为该类别特征信息的关联场景特征信息。
在本公开的实施例中,针对每一类别信息,也可以选取与该类别信息对应的类别特征信息间相似度大于等于预设阈值的场景特征信息作为初选场景特征信息,并从初选场景特征信息中随机选取一个场景特征信息作为该类别特征信息的关联场景特征信息。
在本公开的实施例中,上述预设阈值和N可以结合实际应用需求进行设置。
上述实施例中,通过获取目标类别特征信息的关联场景特征信息,可以实现对某一类别分割对象所出现场景的预测,进而保证后续基于未知类别或已知类别的词向量自动合成图片像素特征时,可以增加该类别所出现的场景的限制,使得图像分割模型的训练更专注于特定场景下图像像素特征的合成。
在步骤S203中,对目标类别特征信息和关联场景特征信息进行拼接处理,得到第一拼接特征信息。
在一个实施例中,目标类别特征信息和关联场景特征信息进行拼接处理可以包括将目标类别特征信息中每一类别信息对应的类别特征信息与该类别特征信息的关联场景特征信息进行拼接处理。例如某一类别信息对应的类别特征信息为[1,2,3,4,5];该类别特征信息的关联场景特征信息为[6,7,8,9,0],在本公开的实施例中,该类别信息对应的第一拼接特征信息可以为[1,2,3,4,5,6,7,8,9,0],也可以为[6,7,8,9,0,1,2,3,4,5]。
在一个实施例中,为了提升零样本学习过程中,特征提取的精准性,可以结合训练样本,训练样本的训练场景特征信息和训练样本的训练类别特征信息进行预训练,相应的,如图4所示,上述方法还可以包括以下步骤。
在步骤S401中,获取训练样本、训练样本的训练场景特征信息和训练样本的训练类别特征信息。
在本公开的实施例中,训练场景特征信息可以为训练样本对应场景信息的词向量;在一个实施例 中,获取训练样本的训练场景特征信息的具体的细化步骤可以参见上述获取场景图像集的场景特征信息集的具体细化步骤,在此不再赘述。
在本公开的实施例中,训练样本的训练类别特征信息可以为训练样本对应类别信息的词向量。在一个实施例中,获取训练样本的训练类别特征信息的具体的细化步骤可以参见上述获取目标类别特征信息的相关细化步骤,在此不再赘述。
在步骤S403中,将训练样本输入待训练分割模型的特征提取网络进行特征提取,得到分割特征图像。
在一个实施例中,待训练分割模型可以包括DeepLab(语义图像分割模型),但本公开的实施例并不以上述为限,在实际应用中,还可以包括其他深度学习模型。
在一个实施例中,待训练分割模型可以包括特征提取网络和分类网络。在本公开的实施例中,特征提取网络可以用于提取图像(训练样本)的特征信息,将训练样本输入待训练分割模型的特征提取网络进行特征提取,可以得到分割特征图像。
在步骤S405中,对训练类别特征信息与训练场景特征信息进行拼接处理,得到第二拼接特征信息。
在一个实施例中,对训练类别特征信息与训练场景特征信息进行拼接处理,得到第二拼接特征信息的具体细化步骤可以参见上述目标类别特征信息和关联场景特征信息进行拼接处理的相关细化步骤,在此不再赘述。
在步骤S407中,将第二拼接特征信息输入待训练生成网络进行图像合成处理,得到第二合成图像。
在一个实施例中,待训练生成网络可以为GAN(Generative Adversarial Networks,生成式对抗网络)中生成器。将第二拼接特征信息输入待训练生成网络进行图像合成处理,可以得到第二合成图像。
在实际应用中,骆驼常常出现在沙漠场景里、鱼常出现在海洋、池塘等场景里,大部分物体(分割对象)所出现的场景是有限的。上述实施例中利用训练样本的训练类别特征信息和训练关联场景特征信息进行拼接处理后,得到的第二拼接特征信息来合成该训练样本对应的合成图像,可以增加对该训练样本对应的分割对象所出现的场景的限制,得到可以准确表征分割对象类别信息和场景信息的第二合成图像,大大提升了对训练样本的特征映射能力。
在步骤S409中,将第二合成图像和分割特征图像输入待训练分割模型的分类网络,分别进行图像分割,得到第二合成图像对应的第二图像分割结果和分割特征图像对应的第三图像分割结果。
在本公开的实施例中,第二合成图像可以包括训练样本中每一训练图像对应的合成图像,相应的,这里每一合成图像对应的第二图像分割结果可以表征该合成图像的预测类别特征信息;在本公开的实施例中,分割特征图像可以包括训练样本中每一训练图像对应的图像特征信息;相应的,这里每一图像特征信息对应的第三图像分割结果可以表征该图像特征信息的预测类别特征信息。
在步骤S411中,将分割特征图像和第二合成图像输入待训练判别网络,分别进行真实性判别,得到分割特征图像对应的第二图像判别结果和第二合成图像对应的第三图像判别结果。
在一个实施例中,待训练判别网络可以为GAN中判别器。在本公开的实施例中,分割特征图像对应的第二图像判别结果可以表征分割特征图像为真实图像的预测概率;第二合成图像对应的第三图像判别结果可以表征第二合成图像为真实图像的预测概率。在本公开的实施例中,真实图像可以为非合成的图像。
在步骤S413中,基于第二合成图像、分割特征图像、第二图像分割结果、第三图像分割结果、训练类型特征信息、第二图像判别结果和第三图像判别结果训练待训练分割模型、待训练生成网络和待训练判别网络,得到初始图像分割模型、初始生成网络和初始判别网络。
在一个实施例中,上述基于第二合成图像、分割特征图像、第二图像分割结果、第三图像分割结果、训练类型特征信息、第二图像判别结果和第三图像判别结果训练待训练分割模型、待训练生成网络和待训练判别网络,得到初始图像分割模型、初始生成网络和初始判别网络可以包括:利用第二合成图像和分割特征图像计算内容损失;利用第二图像分割结果、第三图像分割结果和训练类型特征信息,计算第二分割损失;利用第二图像判别结果和第三图像判别结果计算第二判别损失;根据内容损失、第二判别损失和第二分割损失,确定第二目标损失;在第二目标损失不满足第二预设条件的情况下,更新待训练分割模型、待训练生成网络和待训练判别网络中的网络参数;基于更新后待训练分割模型、待训练生成网络和待训练判别网络更新第二目标损失,至第二目标损失满足第二预设条件,将当前的待训练分割模型作为初始图像分割模型,将当前的待训练生成网络作为初始生成网络,将当前的待训练判别网络作为初始判别网络。
在一个实施例中,内容损失可以反映待训练生成网络生成的第二合成图像与分割特征图间的差异。在一个实施例中,内容损失可以为训练样本中训练图像对应的第二合成图像和分割特征图像间的相似距离。在一个实施例中,第二合成图像和分割特征图像间的相似距离可以包括但不限于第二合成图像和分割特征图像间的余弦距离、欧式距离、曼哈顿距离。在一个实施例中,内容损失的数值的大小与第二合成图像与分割特征图间的差异大小成正比,相应的,内容损失的数值的越小,训练得到的初始生成网络的性能越高。
在一个实施例中,利用第二图像分割结果、第三图像分割结果和训练类型特征信息,计算第二分割损失可以包括基于预设损失函数计算第二图像分割结果与训练类型特征信息间的第一分割子损失,以及计算第三图像分割结果与训练类型特征信息间的第二分割子损失,将第一分割子损失和第二分割子损失进行加权,得到上述第二分割损失。第一分割子损失和第二分割子损失的权重可以结合实际应用需求进行设置。
在本公开的实施例中,第一分割子损失可以表征第二合成图像每个像素点与训练类型特征信息每个像素点间的差异;第二分割子损失可以表征分割特征图像每个像素点与训练类型特征信息每个像素点间的差异。
在一个实施例中,利用第二图像判别结果和第三图像判别结果计算第二判别损失可以包括基于预设损失函数计算第二图像判别结果与分割特征图像对应的真实性标签间的第一判别子损失,以及计算第三图像判别结果与第二合成图像对应的真实性标签间的第二判别子损失。将第一判别子损失和第二 判别子损失进行加权,得到上述第二判别损失。第一判别子损失和第二判别子损失的权重可以结合实际应用需求进行设置。
在本公开的实施例中,第一判别子损失可以表征第二图像判别结果与分割特征图像对应的真实性标签间间差异;第二判别子损失可以表征第三图像判别结果与第二合成图像对应的真实性标签间差异。
在一个实施例中,由于分割特征图像是真实图像,相应的,分割特征图像对应的真实性标签可以为1(1表征真实图像);由于第二合成图像是合成图,不是真实图像;相应的,第二合成图像对应的真实性标签可以为0(0表征非真实图像,即合成图像)。
在本公开的实施例中,预设损失函数可以包括但不限于交叉熵损失函数、逻辑损失函数、Hinge(铰链)损失函数、指数损失函数等,本公开的实施例并不以上述为限。且用于计算判别损失和分割损失的损失函数可以相同,也可以不同。
在一个实施例中,在得到内容损失、第二分割损失和第二判别损失之后,可以对内容损失、第二分割损失和第二判别损失进行加权计算,得到第二目标损失。在本公开的实施例中,内容损失、第二分割损失和第二判别损失的权重可以结合实际应用需求进行设置。
在一个实施例中,第二目标损失满足第二预设条件可以为第二目标损失小于等于指定阈值,或前后两次训练过程中对应的第二目标损失与上一次训练学习后对应的第二目标损失间的差值小于一定阈值。在本公开的实施例中,指定阈值和一定阈值可以为结合实际训练需求进行设置。
在实际应用中,在模型训练的多次迭代过程中,每次会随机的从训练样本中选取部分训练样本来参与本次的训练。相应的,基于更新后待训练分割模型、待训练生成网络和待训练判别网络更新第二目标损失可以包括随机的从训练样本中选取部分训练样本、这部门训练样本的训练类别特征信息和训练场景特征信息,并结合更新后的待训练分割模型、待训练生成网络和待训练判别网络重复上述步骤S403-S413中确定第二目标损失的步骤。
上述实施例中,在预训练过程中,增加各个类别对应分割对象出现场景信息的限制,使得图像分割模型的训练更专注于特定场景下图像像素特征的合成,大大提升了对训练样本的特征映射能力,且结合内容损失、第二分割损失和第二判别损失来确定第二目标损失,可以提高训练好的初始生成网络所生成的合成图像与真实样本间的相似性,进而提升训练出的初始图像分割模型的分割精度。
在步骤S205中,将第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像。
在一个实施例中,初始生成网络可以为基于训练样本的训练类别特征信息和训练样本的训练场景特征信息对GAN中生成器进行预训练后得到的。在本公开的实施例中,将第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像。
上述实施例中利用类别信息对应的类别特征信息和关联场景特征信息进行拼接处理后,得到的第一拼接特征信息来合成该类别信息对应的图像,可以增加该类别信息对应分割对象所出现的场景的限制,得到可以准确表征分割对象类别信息和场景信息的第一合成图像,大大提升了对未知类别的特征映射的能力。
在步骤S207中,将第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果。
在一个实施例中,初始判别网络可以为基于训练样本、训练样本的训练类别特征信息和训练样本的训练场景特征信息对GAN中判别器进行预训练后得到的。
在本公开的实施例中,第一合成图像可以包括训练样本中每一训练图像或预测样本中每一图像对应的合成图像,相应的,这里每一合成图像的第一图像判别结果可以表征该合成图像是否为真实的训练图像或是否为真实的预测样本中图像的预测概率。
在步骤S209中,将第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果。
在一个实施例中,初始图像分割模型为基于训练样本、训练样本的训练场景特征信息和训练样本的训练类别特征信息对待训练分割模型进行预训练得到的。
在本公开的实施例中,将第一合成图像输入初始图像分割模型的分类网络进行图像分割,可以得到第一图像分割结果。在本公开的实施例中,第一合成图像对应的第一图像分割结果可以表征第一合成图像的预测类别特征信息。
在步骤S211中,基于第一图像判别结果、第一图像分割结果和目标类型特征信息训练初始图像分割模型的分类网络,得到目标图像分割模型。
在一个实施例中,基于第一图像判别结果、第一图像分割结果和目标类型特征信息训练初始图像分割模型的分类网络,得到目标图像分割模型可以包括:利用第一图像判别结果和第一合成图像的真实性标签计算第一判别损失;利用第一图像分割结果和目标类型特征信息计算第一分割损失;根据第一判别损失和第一分割损失,确定第一目标损失;在第一目标损失不满足第一预设条件的情况下,更新初始图像分割模型的分类网络、初始生成网络和初始判别网络中的网络参数;基于更新后初始图像分割模型的分类网络、初始生成网络和初始判别网络更新第一目标损失,至第一目标损失满足第一预设条件,将当前的初始图像分割模型作为目标图像分割模型。
在一个实施例中,利用第一图像判别结果和第一合成图像的真实性标签计算第一判别损失可以包括基于预设损失函数计算第一图像判别结果与第一合成图像的真实性标签间的判别损失,将该判别损失作为第一判别损失。在本公开的实施例中,第一判别损失可以表征第一图像判别结果与第一合成图像对应的真实性标签间差异。
在一个实施例中,由于第一合成图像是合成图,不是真实图像;相应的,第一合成图像对应的真实性标签可以为0(0表征非真实图像,即合成图像)。
在一个实施例中,利用第一图像分割结果和目标类型特征信息计算第一分割损失可以包括基于预设损失函数计算第一图像分割结果和目标类型特征信息间的分割损失,将该分割损失作为上述第一分割损失。第一分割损失可以表征一合成图像每个像素点与目标类型特征信息每个像素点间的差异。
在本公开的实施例中,上述预设损失函数可以包括但不限于交叉熵损失函数、逻辑损失函数、Hinge(铰链)损失函数、指数损失函数等,本公开的实施例并不以上述为限。且用于计算判别损失和分割损失的损失函数可以相同,也可以不同。
在一个实施例中,在得到第一分割损失和第一判别损失,可以对第一分割损失和第一判别损失进 行加权计算,得到第一目标损失。在本公开的实施例中,第一分割损失和第一判别损失的权重可以结合实际应用需求进行设置。
在一个实施例中,第一目标损失满足第一预设条件可以为输入第一目标损失小于等于指定阈值,或前后两次训练过程中对应的第一目标损失与上一次训练学习后对应的第一目标损失间的差值小于一定阈值。在本公开的实施例中,指定阈值和一定阈值可以为结合实际训练需求进行设置。
在实际应用中,在模型训练的多次迭代过程中,每次会随机的从目标类别特征信息中选取部分目标类别特征信息和这目标类别特征信息的关联场景特征信息来参与本次的训练。在本公开的实施例中,以较大概率随机出未知类别特征,较小的概率随机出已知类别特征。相应的,基于更新后初始图像分割模型的分类网络、初始生成网络和初始判别网络更新第一目标损失的具体细化可以参见上述基于更新后待训练分割模型、待训练生成网络和待训练判别网络更新第二目标损失的相关细化步骤,在此不再赘述。
上述实施例中,结合第一图像分割结果和目标类型特征信息确定的第一分割损失,以及第一图像判别结果和第一合成图像的真实性标签确定的第二判别损失,来确定第一目标损失,可以在有效保证初始生成网络所生成的第一合成图像与真实样本(训练样本或预测样本)的相似性的基础上,更好的训练初始图像分割模型的分类网络,大大提升零样本分割的精度。
由本公开以上实施例提供的技术方案可见,在本公开的实施例中,通过获取训练样本和预测样本对应类别特征来作为初始图像分割模型的训练数据,可以提高训练好的目标图像分割模型对未知类别的识别能力,且通过获取目标类别特征信息的关联场景特征信息,可以实现对某一类别分割对象所出现场景的预测,进而保证基于未知类别或已知类别的词向量自动合成图片像素特征时,可以增加类别所出现的场景的限制,使得图像分割模型的训练更专注于特定场景下图像像素特征的合成,从而可以更好的利用场景上下文来调整零样本图像分割训练中的分类网络,大大提升零样本分割的精度。
基于上述图像分割模型训练方法的实施例,以下介绍本公开一种图像分割方法的实施例中,图5是根据一示例性实施例示出的一种图像分割方法的流程图,参照图5,该方法可以应用于服务器、终端、边缘计算节点等电子设备中,包括以下步骤。
在步骤S501中,获取待分割图像。
在步骤S503中,将待分割图像输入上述图像分割模型训练方法训练得到的目标图像分割模型,对待分割图像进行图像分割,得到目标分割图像。
在本公开的实施例中,待分割图像可以为需要进行分割的图像,在本公开的实施例中,待分割图像可以包含目标分割对象。相应的,目标分割图像可以为待分割图像中目标分割对象所在区域的图像。
上述实施例中,在目标图像分割模型训练过程中,通过引入场景上下文可以更好的调整目标图像分割模型中的分类器,提升模型特征映射的能力,进而在基于该目标图像分割模型进行图像分割时,可以大大提升分割精度,降低出错率。
图6是根据一示例性实施例示出的一种图像分割模型训练装置框图。参照图6,该装置包括:特征信息获取模块610,被配置为执行获取目标类别特征信息和目标类别特征信息的关联场景特征信息, 目标类别特征信息表征训练样本和预测样本的类别特征;第一拼接处理模块620,被配置为执行对目标类别特征信息和关联场景特征信息进行拼接处理,得到第一拼接特征信息;第一图像合成处理模块630,被配置为执行将第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;第一真实性判别模块640,被配置为执行将第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;第一图像分割模块650,被配置为执行将第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;模型训练模块660,被配置为执行基于第一图像判别结果、第一图像分割结果和目标类型特征信息训练初始图像分割模型的分类网络,得到目标图像分割模型。
在本公开的实施例中,特征信息获取模块610包括:场景图像集获取单元,被配置为执行获取场景图像集;场景识别单元,被配置为执行将场景图像集输入场景识别模型进行场景识别,得到场景信息集;场景特征信息集获取单元,被配置为执行将场景信息集输入目标词向量模型,得到场景特征信息集;相似度计算单元,被配置为执行计算目标类别特征信息与场景特征信息集中场景特征信息间的相似度;关联场景特征信息确定单元,被配置为执行基于相似度从场景特征信息集中确定关联场景特征信息。
在本公开的实施例中,特征信息获取模块610包括:类别信息获取单元,被配置为执行获取训练样本和预测样本的类别信息;目标类别特征信息获取单元,被配置为执行将类别信息输入目标词向量模型,得到目标类别特征信息。
在本公开的实施例中,模型训练模块660包括:第一判别损失计算单元,被配置为执行利用第一图像判别结果和第一合成图像的真实性标签计算第一判别损失;第一分割损失计算单元,被配置为执行利用第一图像分割结果和目标类型特征信息计算第一分割损失;第一目标损失确定单元,被配置为执行根据第一判别损失和第一分割损失,确定第一目标损失;第一网络参数更新单元,被配置为执行在第一目标损失不满足第一预设条件的情况下,更新初始图像分割模型的分类网络、初始生成网络和初始判别网络中的网络参数;目标图像分割模型确定单元,被配置为执行基于更新后初始图像分割模型的分类网络、初始生成网络和初始判别网络更新第一目标损失,至第一目标损失满足第一预设条件,将当前的初始图像分割模型作为目标图像分割模型。
在本公开的实施例中,上述装置还包括:数据获取模块,被配置为执行获取训练样本、训练样本的训练场景特征信息和训练样本的训练类别特征信息;特征提取模块,被配置为执行将训练样本输入待训练分割模型的特征提取网络进行特征提取,得到分割特征图像;第二拼接处理模块,被配置为执行训练类别特征信息与训练场景特征信息进行拼接处理,得到第二拼接特征信息;第二图像合成处理模块,被配置为执行将第二拼接特征信息输入待训练生成网络进行图像合成处理,得到第二合成图像;第二图像分割模块,被配置为执行将第二合成图像和分割特征图像输入待训练分割模型的分类网络,分别进行图像分割,得到第二合成图像对应的第二图像分割结果和分割特征图像对应的第三图像分割结果;第二真实性判别模块,被配置为执行将分割特征图像和第二合成图像输入待训练判别网络,分别进行真实性判别,得到分割特征图像对应的第二图像判别结果和第二合成图像对应的第三图像判别 结果;初始模型训练模块,被配置为执行基于第二合成图像、分割特征图像、第二图像分割结果、第三图像分割结果、训练类型特征信息、第二图像判别结果和第三图像判别结果训练待训练分割模型、待训练生成网络和待训练判别网络,得到初始图像分割模型、初始生成网络和初始判别网络。
在本公开的实施例中,初始模型训练模块包括:内容损失计算单元,被配置为执行利用第二合成图像和分割特征图像计算内容损失;第二分割损失计算单元,被配置为执行利用第二图像分割结果、第三图像分割结果和训练类型特征信息,计算第二分割损失;第二判别损失计算单元,被配置为执行利用第二图像判别结果和第三图像判别结果计算第二判别损失;第二目标损失确定单元,被配置为执行根据内容损失、第二判别损失和第二分割损失,确定第二目标损失;第二网络参数更单元,被配置为执行在第二目标损失不满足第二预设条件的情况下,更新待训练分割模型、待训练生成网络和待训练判别网络中的网络参数;初始模型确定单元,被配置为执行基于更新后待训练分割模型、待训练生成网络和待训练判别网络更新第二目标损失,至第二目标损失满足第二预设条件,将当前的待训练分割模型作为初始图像分割模型,将当前的待训练生成网络作为初始生成网络,将当前的待训练判别网络作为初始判别网络。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图7是根据一示例性实施例示出的一种图像分割装置框图。参照图7,该装置包括:待分割图像获取模块710,被配置为执行获取待分割图像;第三图像分割模块720,被配置为执行将待分割图像输入上述图像分割模型训练方法训练得到的目标图像分割模型,对待分割图像进行图像分割,得到目标分割图像。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图8是根据一示例性实施例示出的一种用于图像分割模型训练或用于图像分割的电子设备的框图,该电子设备可以是终端,其内部结构图可以如图8所示。该电子设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该电子设备的处理器用于提供计算和控制能力。该电子设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该电子设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种图像瑕疵填充网络确定或图像瑕疵处理的方法。该电子设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该电子设备的输入装置可以是显示屏上覆盖的触摸层,也可以是电子设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
图9是根据一示例性实施例示出的一种用于图像分割模型训练或用于图像分割的电子设备的框图,该电子设备可以是服务器,其内部结构图可以如图9所示。该电子设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该电子设备的处理器用于提供计算和控制能力。该电子设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存 储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该电子设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种图像分割模型训练或图像分割的方法。
本领域技术人员可以理解,图8和图9中示出的结构,仅仅是与本公开方案相关的部分结构的框图,并不构成对本公开方案所应用于其上的电子设备的限定,具体的电子设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在示例性实施例中,还提供了一种电子设备,包括:处理器;用于存储该处理器可执行指令的存储器;其中,该处理器被配置为执行该指令,以实现如本公开实施例中的图像分割模型训练或图像分割方法。
在示例性实施例中,还提供了一种存储介质,当该存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行本公开实施例中的图像分割模型训练或图像分割方法。
在示例性实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行本公开实施例中的图像分割模型训练或图像分割方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本公开所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。

Claims (18)

  1. 一种图像分割模型训练方法,包括:
    获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表征训练样本和预测样本的类别特征;
    对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;
    将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;
    将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;
    将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;
    基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
  2. 根据权利要求1所述的图像分割模型训练方法,其中,所述关联场景特征信息的获取步骤包括:
    获取场景图像集,将所述场景图像集输入场景识别模型进行场景识别,得到场景信息集;
    将所述场景信息集输入目标词向量模型,得到场景特征信息集;
    计算所述目标类别特征信息与所述场景特征信息集中场景特征信息间的相似度;
    基于所述相似度从所述场景特征信息集中确定所述关联场景特征信息。
  3. 根据权利要求1所述的图像分割模型训练方法,其中,所述获取目标类别特征信息包括:
    获取所述训练样本和所述预测样本的类别信息;
    将所述类别信息输入目标词向量模型,得到所述目标类别特征信息。
  4. 根据权利要求1所述的图像分割模型训练方法,其中,所述基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型包括:
    利用所述第一图像判别结果和所述第一合成图像的真实性标签计算第一判别损失;
    利用所述第一图像分割结果和所述目标类型特征信息计算第一分割损失;
    根据所述第一判别损失和所述第一分割损失,确定第一目标损失;
    在所述第一目标损失不满足第一预设条件的情况下,更新所述初始图像分割模型的分类网络、所述初始生成网络和所述初始判别网络中的网络参数;
    基于更新后初始图像分割模型的分类网络、初始生成网络和初始判别网络更新所述第一目标损失,至所述第一目标损失满足所述第一预设条件,将当前的初始图像分割模型作为所述目标图像分割模型。
  5. 根据权利要求1至4任一所述的图像分割模型训练方法,其中,所述方法还包括:
    获取所述训练样本、所述训练样本的训练场景特征信息和所述训练样本的训练类别特征信息;
    将所述训练样本输入待训练分割模型的特征提取网络进行特征提取,得到分割特征图像;
    对所述训练类别特征信息与所述训练场景特征信息进行拼接处理,得到第二拼接特征信息;
    将所述第二拼接特征信息输入待训练生成网络进行图像合成处理,得到第二合成图像;
    将所述第二合成图像和所述分割特征图像输入所述待训练分割模型的分类网络,分别进行图像分割,得到所述第二合成图像对应的第二图像分割结果和所述分割特征图像对应的第三图像分割结果;
    将所述分割特征图像和所述第二合成图像输入待训练判别网络,分别进行真实性判别,得到所述分割特征图像对应的第二图像判别结果和所述第二合成图像对应的第三图像判别结果;
    基于所述第二合成图像、所述分割特征图像、所述第二图像分割结果、所述第三图像分割结果、所述训练类型特征信息、所述第二图像判别结果和所述第三图像判别结果训练所述待训练分割模型、所述待训练生成网络和所述待训练判别网络,得到所述初始图像分割模型、所述初始生成网络和所述初始判别网络。
  6. 根据权利要求5所述的图像分割模型训练方法,其中,所述基于所述第二合成图像、所述分割特征图像、所述第二图像分割结果、所述第三图像分割结果、所述训练类型特征信息、所述第二图像判别结果和所述第三图像判别结果训练所述待训练分割模型、所述待训练生成网络和所述待训练判别网络,得到所述初始图像分割模型、所述初始生成网络和所述初始判别网络包括:
    利用所述第二合成图像和所述分割特征图像计算内容损失;
    利用所述第二图像分割结果、所述第三图像分割结果和所述训练类型特征信息,计算第二分割损失;
    利用所述第二图像判别结果和所述第三图像判别结果计算第二判别损失;
    根据所述内容损失、所述第二判别损失和所述第二分割损失,确定第二目标损失;
    在所述第二目标损失不满足第二预设条件的情况下,更新所述待训练分割模型、所述待训练生成网络和所述待训练判别网络中的网络参数;
    基于更新后待训练分割模型、待训练生成网络和待训练判别网络更新所述第二目标损失,至所述第二目标损失满足所述第二预设条件,将当前的待训练分割模型作为初始图像分割模型,将当前的待训练生成网络作为所述初始生成网络,将当前的待训练判别网络作为所述初始判别网络。
  7. 一种图像分割方法,包括:
    获取待分割图像;
    将所述待分割图像输入目标图像分割模型,对所述待分割图像进行图像分割,得到目标分割图像,其中
    所述目标图像分割模型根据一种图像分割模型训练方法训练得到,并且所述图像分割模型训练方法包括:
    获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表 征训练样本和预测样本的类别特征;
    对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;
    将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;
    将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;
    将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;
    基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
  8. 一种图像分割模型训练装置,包括:
    特征信息获取模块,被配置为执行获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表征训练样本和预测样本的类别特征;
    第一拼接处理模块,被配置为执行对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;
    第一图像合成处理模块,被配置为执行将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;
    第一真实性判别模块,被配置为执行将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;
    第一图像分割模块,被配置为执行将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;
    模型训练模块,被配置为执行基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
  9. 根据权利要求8所述的图像分割模型训练装置,其中,所述特征信息获取模块包括:
    场景图像集获取单元,被配置为执行获取场景图像集;
    场景识别单元,被配置为执行将所述场景图像集输入场景识别模型进行场景识别,得到场景信息集;
    场景特征信息集获取单元,被配置为执行将所述场景信息集输入目标词向量模型,得到场景特征信息集;
    相似度计算单元,被配置为执行计算所述目标类别特征信息与所述场景特征信息集中场景特征信息间的相似度;
    关联场景特征信息确定单元,被配置为执行基于所述相似度从所述场景特征信息集中确定所述关联场景特征信息。
  10. 根据权利要求8所述的图像分割模型训练装置,其中,所述特征信息获取模块包括:
    类别信息获取单元,被配置为执行获取所述训练样本和所述预测样本的类别信息;
    目标类别特征信息获取单元,被配置为执行将所述类别信息输入目标词向量模型,得到所述目标类别特征信息。
  11. 根据权利要求8所述的图像分割模型训练装置,其中,所述模型训练模块包括:
    第一判别损失计算单元,被配置为执行利用所述第一图像判别结果和所述第一合成图像的真实性标签计算第一判别损失;
    第一分割损失计算单元,被配置为执行利用所述第一图像分割结果和所述目标类型特征信息计算第一分割损失;
    第一目标损失确定单元,被配置为执行根据所述第一判别损失和所述第一分割损失,确定第一目标损失;
    第一网络参数更新单元,被配置为执行在所述第一目标损失不满足第一预设条件的情况下,更新所述初始图像分割模型的分类网络、所述初始生成网络和所述初始判别网络中的网络参数;
    目标图像分割模型确定单元,被配置为执行基于更新后初始图像分割模型的分类网络、初始生成网络和初始判别网络更新所述第一目标损失,至所述第一目标损失满足所述第一预设条件,将当前的初始图像分割模型作为所述目标图像分割模型。
  12. 根据权利要求8至11任一所述的图像分割模型训练装置,其中,所述装置还包括:
    数据获取模块,被配置为执行获取所述训练样本、所述训练样本的训练场景特征信息和所述训练样本的训练类别特征信息;
    特征提取模块,被配置为执行将所述训练样本输入待训练分割模型的特征提取网络进行特征提取,得到分割特征图像;
    第二拼接处理模块,被配置为执行所述训练类别特征信息与所述训练场景特征信息进行拼接处理,得到第二拼接特征信息;
    第二图像合成处理模块,被配置为执行将所述第二拼接特征信息输入待训练生成网络进行图像合成处理,得到第二合成图像;
    第二图像分割模块,被配置为执行将所述第二合成图像和所述分割特征图像输入所述待训练分割模型的分类网络,分别进行图像分割,得到所述第二合成图像对应的第二图像分割结果和所述分割特征图像对应的第三图像分割结果;
    第二真实性判别模块,被配置为执行将所述分割特征图像和所述第二合成图像输入待训练判别网络,分别进行真实性判别,得到所述分割特征图像对应的第二图像判别结果和所述第二合成图像对应的第三图像判别结果;
    初始模型训练模块,被配置为执行基于所述第二合成图像、所述分割特征图像、所述第二图像分割结果、所述第三图像分割结果、所述训练类型特征信息、所述第二图像判别结果和所述第三图像判别结果训练所述待训练分割模型、所述待训练生成网络和所述待训练判别网络,得到所述初始图像分 割模型、所述初始生成网络和所述初始判别网络。
  13. 根据权利要求12所述的图像分割模型训练装置,其中,所述初始模型训练模块包括:
    内容损失计算单元,被配置为执行利用所述第二合成图像和所述分割特征图像计算内容损失;
    第二分割损失计算单元,被配置为执行利用所述第二图像分割结果、所述第三图像分割结果和所述训练类型特征信息,计算第二分割损失;
    第二判别损失计算单元,被配置为执行利用所述第二图像判别结果和所述第三图像判别结果计算第二判别损失;
    第二目标损失确定单元,被配置为执行根据所述内容损失、所述第二判别损失和所述第二分割损失,确定第二目标损失;
    第二网络参数更单元,被配置为执行在所述第二目标损失不满足第二预设条件的情况下,更新所述待训练分割模型、所述待训练生成网络和所述待训练判别网络中的网络参数;
    初始模型确定单元,被配置为执行基于更新后待训练分割模型、待训练生成网络和待训练判别网络更新所述第二目标损失,至所述第二目标损失满足所述第二预设条件,将当前的待训练分割模型作为初始图像分割模型,将当前的待训练生成网络作为所述初始生成网络,将当前的待训练判别网络作为所述初始判别网络。
  14. 一种图像分割装置,包括:
    待分割图像获取模块,被配置为执行获取待分割图像;
    第三图像分割模块,被配置为执行将所述待分割图像输入目标图像分割模型,对所述待分割图像进行图像分割,得到目标分割图像,
    其中,所述目标图像分割模型根据一种图像分割模型训练方法训练得到,并且所述图像分割模型训练方法包括:
    获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表征训练样本和预测样本的类别特征;
    对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;
    将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;
    将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;
    将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;
    基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
  15. 一种电子设备,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令,以实现一种图像分割模型训练方法,所述图像分割模型训练方法,包括:
    获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表征训练样本和预测样本的类别特征;
    对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;
    将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;
    将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;
    将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
  16. 一种电子设备,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令,以实现一种图像分割方法,所述图像分割方法包括:
    获取待分割图像;
    将所述待分割图像输入目标图像分割模型,对所述待分割图像进行图像分割,得到目标分割图像,
    其中,所述目标图像分割模型根据一种图像分割模型训练方法训练得到,并且所述图像分割模型训练方法包括:
    获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表征训练样本和预测样本的类别特征;
    对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;
    将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;
    将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;
    将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;
    基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
  17. 一种计算机可读存储介质,其中,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行一种图像分割模型训练方法,所述图像分割模型训练方法包括:
    获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表征训练样本和预测样本的类别特征;
    对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;
    将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;
    将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;
    将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;
    基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
  18. 一种计算机可读存储介质,其中,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行一种图像分割方法,所述图像分割方法包括:
    获取待分割图像;
    将所述待分割图像输入目标图像分割模型,对所述待分割图像进行图像分割,得到目标分割图像,
    其中,所述目标图像分割模型根据一种图像分割模型训练方法训练得到,并且所述图像分割模型训练方法包括:
    获取目标类别特征信息和所述目标类别特征信息的关联场景特征信息,所述目标类别特征信息表征训练样本和预测样本的类别特征;
    对所述目标类别特征信息和所述关联场景特征信息进行拼接处理,得到第一拼接特征信息;
    将所述第一拼接特征信息输入初始生成网络进行图像合成处理,得到第一合成图像;
    将所述第一合成图像输入初始判别网络进行真实性判别,得到第一图像判别结果;
    将所述第一合成图像输入初始图像分割模型的分类网络进行图像分割,得到第一图像分割结果;
    基于所述第一图像判别结果、所述第一图像分割结果和所述目标类型特征信息训练所述初始图像分割模型的分类网络,得到目标图像分割模型。
PCT/CN2021/117037 2020-12-28 2021-09-07 用于图像分割模型训练和图像分割的方法及装置 WO2022142450A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21913197.6A EP4095801A1 (en) 2020-12-28 2021-09-07 Methods and apparatuses for image segmentation model training and for image segmentation
US17/895,629 US20230022387A1 (en) 2020-12-28 2022-08-25 Method and apparatus for image segmentation model training and for image segmentation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011574785.5A CN112330685B (zh) 2020-12-28 2020-12-28 图像分割模型训练、图像分割方法、装置及电子设备
CN202011574785.5 2020-12-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/895,629 Continuation US20230022387A1 (en) 2020-12-28 2022-08-25 Method and apparatus for image segmentation model training and for image segmentation

Publications (1)

Publication Number Publication Date
WO2022142450A1 true WO2022142450A1 (zh) 2022-07-07

Family

ID=74301891

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117037 WO2022142450A1 (zh) 2020-12-28 2021-09-07 用于图像分割模型训练和图像分割的方法及装置

Country Status (4)

Country Link
US (1) US20230022387A1 (zh)
EP (1) EP4095801A1 (zh)
CN (1) CN112330685B (zh)
WO (1) WO2022142450A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115761239A (zh) * 2023-01-09 2023-03-07 深圳思谋信息科技有限公司 一种语义分割方法及相关装置
CN115761222A (zh) * 2022-09-27 2023-03-07 阿里巴巴(中国)有限公司 图像分割方法、遥感图像分割方法以及装置

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330685B (zh) * 2020-12-28 2021-04-06 北京达佳互联信息技术有限公司 图像分割模型训练、图像分割方法、装置及电子设备
CN113362286B (zh) * 2021-05-24 2022-02-01 江苏星月测绘科技股份有限公司 一种基于深度学习的自然资源要素变化检测方法
CN113222055B (zh) * 2021-05-28 2023-01-10 新疆爱华盈通信息技术有限公司 一种图像分类方法、装置、电子设备及存储介质
CN113470048B (zh) * 2021-07-06 2023-04-25 北京深睿博联科技有限责任公司 场景分割方法、装置、设备及计算机可读存储介质
CN113642612B (zh) * 2021-07-19 2022-11-18 北京百度网讯科技有限公司 样本图像生成方法、装置、电子设备及存储介质
CN114119438A (zh) * 2021-11-11 2022-03-01 清华大学 图像拼贴模型的训练方法和设备及图像拼贴方法和设备
CN115223015B (zh) * 2022-09-16 2023-01-03 小米汽车科技有限公司 模型训练方法、图像处理方法、装置和车辆
CN115331012B (zh) * 2022-10-14 2023-03-24 山东建筑大学 基于零样本学习的联合生成式图像实例分割方法及系统
CN116167922B (zh) * 2023-04-24 2023-07-18 广州趣丸网络科技有限公司 一种抠图方法、装置、存储介质及计算机设备
CN117557221A (zh) * 2023-11-17 2024-02-13 德联易控科技(北京)有限公司 一种车辆损伤报告的生成方法、装置、设备和可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090074292A1 (en) * 2007-09-18 2009-03-19 Microsoft Corporation Optimization of Multi-Label Problems in Computer Vision
CN111429460A (zh) * 2020-06-12 2020-07-17 腾讯科技(深圳)有限公司 图像分割方法、图像分割模型训练方法、装置和存储介质
CN111652121A (zh) * 2020-06-01 2020-09-11 腾讯科技(深圳)有限公司 一种表情迁移模型的训练方法、表情迁移的方法及装置
CN112017189A (zh) * 2020-10-26 2020-12-01 腾讯科技(深圳)有限公司 图像分割方法、装置、计算机设备和存储介质
CN112330685A (zh) * 2020-12-28 2021-02-05 北京达佳互联信息技术有限公司 图像分割模型训练、图像分割方法、装置及电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10755149B2 (en) * 2017-05-05 2020-08-25 Hrl Laboratories, Llc Zero shot machine vision system via joint sparse representations
CN111444889B (zh) * 2020-04-30 2023-07-25 南京大学 基于多级条件影响的卷积神经网络的细粒度动作检测方法
CN111612010A (zh) * 2020-05-21 2020-09-01 京东方科技集团股份有限公司 图像处理方法、装置、设备以及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090074292A1 (en) * 2007-09-18 2009-03-19 Microsoft Corporation Optimization of Multi-Label Problems in Computer Vision
CN111652121A (zh) * 2020-06-01 2020-09-11 腾讯科技(深圳)有限公司 一种表情迁移模型的训练方法、表情迁移的方法及装置
CN111429460A (zh) * 2020-06-12 2020-07-17 腾讯科技(深圳)有限公司 图像分割方法、图像分割模型训练方法、装置和存储介质
CN112017189A (zh) * 2020-10-26 2020-12-01 腾讯科技(深圳)有限公司 图像分割方法、装置、计算机设备和存储介质
CN112330685A (zh) * 2020-12-28 2021-02-05 北京达佳互联信息技术有限公司 图像分割模型训练、图像分割方法、装置及电子设备

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115761222A (zh) * 2022-09-27 2023-03-07 阿里巴巴(中国)有限公司 图像分割方法、遥感图像分割方法以及装置
CN115761222B (zh) * 2022-09-27 2023-11-03 阿里巴巴(中国)有限公司 图像分割方法、遥感图像分割方法以及装置
CN115761239A (zh) * 2023-01-09 2023-03-07 深圳思谋信息科技有限公司 一种语义分割方法及相关装置

Also Published As

Publication number Publication date
CN112330685A (zh) 2021-02-05
EP4095801A1 (en) 2022-11-30
US20230022387A1 (en) 2023-01-26
CN112330685B (zh) 2021-04-06

Similar Documents

Publication Publication Date Title
WO2022142450A1 (zh) 用于图像分割模型训练和图像分割的方法及装置
US11373390B2 (en) Generating scene graphs from digital images using external knowledge and image reconstruction
CN112270686B (zh) 图像分割模型训练、图像分割方法、装置及电子设备
CN111062871B (zh) 一种图像处理方法、装置、计算机设备及可读存储介质
US10789456B2 (en) Facial expression recognition utilizing unsupervised learning
US20200117906A1 (en) Space-time memory network for locating target object in video content
CN109145766B (zh) 模型训练方法、装置、识别方法、电子设备及存储介质
WO2019232772A1 (en) Systems and methods for content identification
CN110765860A (zh) 摔倒判定方法、装置、计算机设备及存储介质
CN113095346A (zh) 数据标注的方法以及数据标注的装置
WO2021114612A1 (zh) 目标重识别方法、装置、计算机设备和存储介质
US20170116521A1 (en) Tag processing method and device
WO2023051140A1 (zh) 用于图像特征表示生成的方法、设备、装置和介质
CN112818995B (zh) 图像分类方法、装置、电子设备及存储介质
CN110807472B (zh) 图像识别方法、装置、电子设备及存储介质
US20230143452A1 (en) Method and apparatus for generating image, electronic device and storage medium
WO2022057309A1 (zh) 肺部特征识别方法、装置、计算机设备及存储介质
CN113192175A (zh) 模型训练方法、装置、计算机设备和可读存储介质
CN115935049A (zh) 基于人工智能的推荐处理方法、装置及电子设备
CN112801107A (zh) 一种图像分割方法和电子设备
CN115565186B (zh) 文字识别模型的训练方法、装置、电子设备和存储介质
WO2022117014A1 (en) System, method and apparatus for training a machine learning model
CN114841851A (zh) 图像生成方法、装置、电子设备及存储介质
Tomei et al. Image-to-image translation to unfold the reality of artworks: an empirical analysis
CN113569081A (zh) 图像识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913197

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021913197

Country of ref document: EP

Effective date: 20220826

NENP Non-entry into the national phase

Ref country code: DE