CN114638839B - Small sample video target segmentation method based on dynamic prototype learning - Google Patents

Small sample video target segmentation method based on dynamic prototype learning Download PDF

Info

Publication number
CN114638839B
CN114638839B CN202210536170.6A CN202210536170A CN114638839B CN 114638839 B CN114638839 B CN 114638839B CN 202210536170 A CN202210536170 A CN 202210536170A CN 114638839 B CN114638839 B CN 114638839B
Authority
CN
China
Prior art keywords
video frame
prototype
matrix
support
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210536170.6A
Other languages
Chinese (zh)
Other versions
CN114638839A (en
Inventor
张天柱
张哲�
张勇东
罗乃淞
吴枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202210536170.6A priority Critical patent/CN114638839B/en
Publication of CN114638839A publication Critical patent/CN114638839A/en
Application granted granted Critical
Publication of CN114638839B publication Critical patent/CN114638839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small sample video target segmentation method based on dynamic prototype learning, which comprises the following steps: acquiring a video target to be segmented; and processing the video target to be segmented by using a small sample video target segmentation model based on dynamic prototype learning to obtain a video target segmentation result. According to the small sample video target segmentation method, the optimal transmission method is used for self-adaptive learning of the dynamic prototype, noise attention is effectively reduced, and meanwhile, a guiding mode is adopted for matching the multi-level feature map, so that the calculated amount is greatly reduced; the method can fully extract the target information in a small number of support set samples, and remarkably improves the segmentation performance on the video of the challenge set. The invention also discloses electronic equipment, a storage medium and a computer program product for executing the small sample video object segmentation method based on the dynamic prototype learning.

Description

Small sample video target segmentation method based on dynamic prototype learning
Technical Field
The invention relates to the field of computer vision, in particular to a training method of a small sample video target segmentation model and a video target segmentation method.
Background
Video target segmentation is a technology for predicting a foreground target mask in each frame of a video, and has wide application in the aspects of augmented reality, automatic driving, video editing and the like.
The prior art segmentation of video objects is typically based on semi-supervised and unsupervised. The semi-supervision method needs to give target information of a first frame of each video, then carries out dense association on targets in subsequent frames of the video, and the process seriously depends on a large amount of densely segmented and labeled data, so that time and labor are consumed; the unsupervised method has low performance due to the lack of the labeling data, and cannot meet the requirements of practical application. In addition, the above two methods cannot be well generalized to new target classes, and the segmentation capability on classes not seen in the training phase is sharply reduced, which limits the extensibility and practicability of video target recognition.
Disclosure of Invention
In view of the above, it is a primary object of the present invention to provide a small sample video object segmentation method based on dynamic prototype learning, an electronic device, a storage medium and a computer program product, which are intended to at least partially solve at least one of the above-mentioned technical problems.
According to a first aspect of the present invention, there is provided a small sample video object segmentation method based on dynamic prototype learning, including:
acquiring a video target to be segmented;
processing a video target to be segmented by using a small sample video target segmentation model based on dynamic prototype learning to obtain a video target segmentation result, wherein the small sample video target segmentation model based on dynamic prototype learning is obtained by training according to the following method:
processing the video frame images of the challenge set and the video frame images of the support set by utilizing a part of neural network layers of a feature extraction module of a small sample video object segmentation model to obtain low-level features of the challenge video frame and low-level features of the support video frame;
processing the video frame images of the challenge set by using all neural network layers of a feature extraction module of the small sample video object segmentation model to obtain the features of the challenge video frame;
carrying out mask operation on the low-level features of the support video frame to obtain foreground features of the support video frame;
processing foreground characteristics of the support video frame and characteristics of the challenge video frame by utilizing a mining module of a small sample video object segmentation model to obtain a corresponding relation matrix;
processing the low-level features of the support video frames, the low-level features of the inquiry video frames and the corresponding relation matrix by using a guide module of the small sample video target segmentation model to obtain a low-level corresponding relation matrix;
processing the corresponding relation matrix and the low-level corresponding relation matrix by using a segmentation module of the small sample video target segmentation model to obtain a video target segmentation result, and optimizing the small sample video target segmentation model by using a loss function of the small sample video target segmentation model;
and (4) performing feature extraction operation, masking operation, mining operation, guiding operation, segmentation operation and optimization operation in an iterative manner until the value of the loss function meets a preset condition to obtain a trained small sample video target segmentation model.
According to the embodiment of the present invention, the processing of the foreground feature of the support video frame and the feature of the challenge video frame by the mining module of the small sample video object segmentation model to obtain the corresponding relationship matrix includes:
processing foreground characteristics of the support video frame by utilizing a prototype generator of the mining module to obtain dynamic prototype characteristics;
calculating the dynamic prototype characteristics and the foreground characteristics of the support video frame to obtain a support corresponding relation matrix;
calculating the dynamic prototype characteristics and the characteristics of the inquiry video frame to obtain an inquiry corresponding relation matrix;
and operating the support corresponding relation matrix and the inquiry corresponding relation matrix to obtain a corresponding relation matrix.
According to an embodiment of the present invention, the processing of the foreground feature of the support video frame by the prototype generator of the mining module to obtain the dynamic prototype feature includes:
carrying out global average pooling on foreground features of the support video frame to obtain video target prototype features;
calculating foreground characteristics of the support video frame and prototype characteristics of the video target by using a prototype generator to obtain an attention matrix;
processing the attention matrix by using an optimal transmission algorithm to obtain an optimal distribution matrix;
and calculating the foreground characteristics of the support video frame and the optimal distribution matrix, and calculating the calculation result and the video target prototype characteristics to obtain the dynamic prototype characteristics.
According to an embodiment of the present invention, the above-mentioned attention matrix is determined by equation (1):
Figure 884731DEST_PATH_IMAGE001
(1),
wherein the content of the first and second substances,
Figure 67451DEST_PATH_IMAGE002
is the first
Figure 112768DEST_PATH_IMAGE003
The foreground feature vector of each support video frame,
Figure 39135DEST_PATH_IMAGE003
index representing foreground feature vector of support video frame, for length of
Figure 853507DEST_PATH_IMAGE004
The foreground feature vector of the supporting video frame,
Figure 941549DEST_PATH_IMAGE003
is in a range of values
Figure 474162DEST_PATH_IMAGE005
Figure 938641DEST_PATH_IMAGE006
Is the first
Figure 607520DEST_PATH_IMAGE007
The characteristics of each prototype are characterized in that,
Figure 132042DEST_PATH_IMAGE007
index representing prototype features for
Figure 151951DEST_PATH_IMAGE008
The characteristics of each prototype are characterized in that,
Figure 888963DEST_PATH_IMAGE007
is in the value range of
Figure 677927DEST_PATH_IMAGE009
Figure 638930DEST_PATH_IMAGE010
Is a matrix of support focus forces,
Figure 146134DEST_PATH_IMAGE011
is a support attention force matrix
Figure 686837DEST_PATH_IMAGE012
To (1) a
Figure 595887DEST_PATH_IMAGE007
Go to the first
Figure 196633DEST_PATH_IMAGE003
Column value for indicating the second
Figure 191134DEST_PATH_IMAGE007
Individual prototype features and
Figure 269948DEST_PATH_IMAGE003
similarity of foreground feature vectors of the individual support video frames;
wherein the dynamic prototype feature is determined by equation (2):
Figure 33505DEST_PATH_IMAGE013
(2),
wherein the content of the first and second substances,
Figure 70731DEST_PATH_IMAGE014
is a sequence of foreground feature vectors that support video frames,
Figure 286949DEST_PATH_IMAGE015
is the first
Figure 435033DEST_PATH_IMAGE016
Is characterized by a prototype
Figure 53096DEST_PATH_IMAGE017
The obtained dynamic prototype characteristics are updated according to the dynamic prototype characteristics,
Figure 261224DEST_PATH_IMAGE018
representing the optimized support set attention force matrix,
Figure 964738DEST_PATH_IMAGE019
is the first of the optimized support attention force matrix
Figure 650934DEST_PATH_IMAGE016
A row vector.
According to an embodiment of the present invention, the processing, by the guidance module of the small sample video object segmentation model, the low-level feature of the support video frame, the low-level feature of the inquiry video frame, and the correspondence matrix to obtain the low-level correspondence matrix includes:
selecting a preset row number and a preset column number of the corresponding relation matrix to obtain an intermediate corresponding relation matrix;
calculating the low-level feature of the support video frame and the intermediate corresponding relation matrix to obtain a reconstructed feature matrix;
and performing operation on the reconstructed feature matrix and the low-level features of the inquiry video frame to obtain a low-level corresponding relation matrix.
According to an embodiment of the present invention, the above guidance module is determined by formula (3) and formula (4):
Figure 389083DEST_PATH_IMAGE020
(3),
Figure 768112DEST_PATH_IMAGE021
(4),
wherein the content of the first and second substances,
Figure 958921DEST_PATH_IMAGE022
is a temperature factor, for controlling the degree of smoothing of the output probability distribution,
Figure 448809DEST_PATH_IMAGE023
the length of the modulus of the vector is represented,
Figure 775885DEST_PATH_IMAGE024
is the first
Figure 325815DEST_PATH_IMAGE025
An individual challenge video frame feature vector is generated,
Figure 738342DEST_PATH_IMAGE025
index representing the feature vector of the challenge video frame, for height and width respectively
Figure 31920DEST_PATH_IMAGE026
And
Figure 479082DEST_PATH_IMAGE027
the image of the video frame of the challenge,
Figure 934334DEST_PATH_IMAGE025
is in the value range of
Figure 99736DEST_PATH_IMAGE028
Figure 931426DEST_PATH_IMAGE029
Is an allocation matrix of dynamic prototype features and challenge video frame features,
Figure 498673DEST_PATH_IMAGE030
first of the allocation matrix representing dynamic prototype features and challenging video frame features
Figure 124827DEST_PATH_IMAGE007
Go to the first
Figure 777525DEST_PATH_IMAGE025
The value of the column is such that,
Figure 412905DEST_PATH_IMAGE031
an assignment matrix representing the optimized dynamic prototype features and the foreground features of the support video frame,
Figure 303501DEST_PATH_IMAGE032
a correspondence matrix representing the features of the challenge video frame and the foreground features of the support video frame,softmaxrepresenting a normalized exponential function.
According to an embodiment of the present invention, the loss function of the small sample video object segmentation model includes a cross-over ratio loss function and a cross-entropy loss function;
wherein the cross entropy loss function is determined by equation (5):
Figure 366135DEST_PATH_IMAGE033
(5),
wherein, the first and the second end of the pipe are connected with each other,
Figure 240550DEST_PATH_IMAGE034
and
Figure 679622DEST_PATH_IMAGE035
respectively representing the height and width of the incoming challenge video frame image or support video frame image,
Figure 955882DEST_PATH_IMAGE036
which represents the product of the height and the width,
Figure 189417DEST_PATH_IMAGE037
is the result of the real segmentation,
Figure 551129DEST_PATH_IMAGE038
representing the second in the real segmentation result
Figure 528312DEST_PATH_IMAGE039
Go to the first
Figure 659079DEST_PATH_IMAGE040
The value of the column is such that,
Figure 797936DEST_PATH_IMAGE041
is the result of the segmentation predicted by the model,
Figure 381364DEST_PATH_IMAGE043
representation of the segmentation results of model predictions
Figure 896659DEST_PATH_IMAGE039
Go to the first
Figure 413091DEST_PATH_IMAGE040
The value of the column;
wherein the cross-over ratio loss function is determined by equation (6):
Figure 722850DEST_PATH_IMAGE044
(6),
wherein the content of the first and second substances,
Figure 59153DEST_PATH_IMAGE045
a norm of the matrix is represented.
According to a second aspect of the present invention, there is provided an electronic apparatus comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above dynamic prototype learning-based small sample video object segmentation method.
According to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-mentioned small-sample video object segmentation method based on dynamic prototype learning.
The small sample video target segmentation method based on dynamic prototype learning provided by the invention has the advantages that the optimal transmission method is used for adaptively learning the dynamic prototype, the noise attention is effectively reduced, meanwhile, the multi-level feature maps are matched in a guiding mode, the calculated amount is greatly reduced, meanwhile, the video segmentation method provided by the invention fully extracts the target information in a small number of support set samples, and the segmentation performance on the video of an inquiry set is obviously improved.
Drawings
FIG. 1 is a flow chart of a small sample video object segmentation method based on dynamic prototype learning according to an embodiment of the present invention;
FIG. 2 is a flowchart of a training method of a small sample video object segmentation model based on dynamic prototype learning according to an embodiment of the present invention;
FIG. 3 is a flow chart of obtaining a correspondence matrix according to an embodiment of the present invention;
FIG. 4 is a flow diagram for obtaining dynamic prototype features according to an embodiment of the present invention;
FIG. 5 is a flow diagram of obtaining a low-level correspondence matrix according to an embodiment of the invention;
FIG. 6 is a small sample video object segmentation model framework diagram based on dynamic prototype learning according to an embodiment of the present invention;
fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a small-sample video object segmentation method of base dynamic prototype learning, in accordance with an embodiment of the present invention.
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
The invention provides a small sample video target segmentation method based on dynamic prototype learning, which aims to reduce the dependence on data, improve the expansibility and the practicability and achieve better video target segmentation performance by using a small amount of data with labels.
In the current method, the method of dense matching by using multi-level features achieves the leading performance. However, dense matching of pixel-by-pixel features introduces a large amount of correspondence noise, and further processing on multiple scales increases the computational effort. The method provided by the invention can learn the target prototype in a self-adaptive manner, realize robust multi-level dense matching in a mode of an intermediate bridge, and effectively alleviate the problems of noise and large calculation amount.
The video segmentation method provided by the invention can be applied to an application system related to video object segmentation; the target in the input video is segmented according to the information provided by a small number of support set images, and the method can be widely applied to scenes such as augmented reality, automatic driving, video editing and the like. In a specific embodiment, the method can be embedded into a mobile device in a software form, and provides a real-time segmentation result of a recorded video; or the method can be installed in a background server to provide the processing result of the video in a large batch.
Fig. 1 is a flowchart of a small sample video object segmentation method based on dynamic prototype learning according to an embodiment of the present invention.
As shown in FIG. 1, the method includes operations S110 to S120.
In operation S110, a video object to be segmented is acquired.
In operation S120, a video object to be segmented is processed by using a small-sample video object segmentation model based on dynamic prototype learning, and a video object segmentation result is obtained.
Fig. 2 is a flowchart of a training method of a small sample video object segmentation model based on dynamic prototype learning according to an embodiment of the present invention.
As shown in FIG. 2, the method includes operations S210 to S270.
In operation S210, the challenge set video frame image and the support set video frame image are processed by using a part of the neural network layer of the feature extraction module of the small sample video object segmentation model, so as to obtain a low-level feature of the challenge video frame and a low-level feature of the support video frame.
The low-level features are processed by a part of neural network layers of the feature extraction module, so that the resolution is high, more detail information is included, but the low-level features are lower in semantic and more in noise. High-level features (or features) as opposed to low-level features, which traverse more layers of the neural network than low-level features, have stronger semantic information, but have a lower resolution and a poorer perception of detail.
In operation S220, all the neural network layers of the feature extraction module of the small sample video object segmentation model are used to process the video frame images of the challenge set, so as to obtain the features of the challenge video frame.
For input support set video frame images and inquiry set video frame images belonging to the same category, the feature extraction module is utilized to perform multi-level feature extraction based on a ResNet-50 network, and then a 1x1 convolutional layer is mapped to a common measurement space.
In operation S230, a mask operation is performed on the low-level features of the support video frame to obtain foreground features of the support video frame.
In operation S240, the mining module of the small sample video object segmentation model is used to process the foreground feature of the support video frame and the feature of the challenge video frame, so as to obtain a corresponding relationship matrix.
In operation S250, the low-level feature of the support video frame, the low-level feature of the challenge video frame, and the corresponding relationship matrix are processed by using the guidance module of the small-sample video object segmentation model to obtain a low-level corresponding relationship matrix.
In operation S260, the correspondence matrix and the low-level correspondence matrix are processed by using a segmentation module of the small-sample video object segmentation model to obtain a video object segmentation result, and the small-sample video object segmentation model is optimized by using a loss function of the small-sample video object segmentation model.
In operation S270, the feature extraction operation, the masking operation, the mining operation, the guiding operation, the segmentation operation, and the optimization operation are performed iteratively until the value of the loss function satisfies the preset condition, so as to obtain a trained small sample video target segmentation model.
According to the training method provided by the invention, a small sample video target segmentation model with reliability, generalization and high efficiency can be obtained by utilizing a dynamic prototype mining module and a multi-level dynamic guide module based on an optimal transmission algorithm; the trained small sample video target segmentation model is used for segmenting the video target, an optimal transmission method can be used for adaptively learning a dynamic prototype, noise attention is effectively reduced, meanwhile, a multi-level characteristic diagram is matched in a guiding mode, the calculated amount is greatly reduced, meanwhile, the video segmentation method provided by the invention fully extracts the target information in a small number of support set samples, and the segmentation performance on the challenge set video is obviously improved.
Fig. 3 is a flowchart of obtaining a correspondence matrix according to an embodiment of the present invention.
As shown in fig. 3, the mining module of the small sample video object segmentation model is used for processing foreground features of the support video frame and features of the inquiry video frame, and obtaining a corresponding relation matrix includes operations S310 to S340.
In operation S310, the foreground features of the support video frame are processed by using a prototype generator of the mining module to obtain dynamic prototype features.
In operation S320, the dynamic prototype feature and the foreground feature of the support video frame are calculated to obtain a support correspondence matrix.
In operation S330, the dynamic prototype feature and the feature of the challenge video frame are operated to obtain a challenge correspondence matrix.
In operation S340, the support correspondence matrix and the challenge correspondence matrix are operated to obtain a correspondence matrix.
In the process of acquiring the corresponding relation matrix, the feature points which support the incidence relation between the foreground features of the video frame and the features of the video frame to be inquired can be fully excavated by utilizing the excavation module of the dynamic prototype based on the optimal transmission algorithm, so that more firm data support is provided for the training of the subsequent model.
FIG. 4 is a flow diagram for obtaining dynamic prototype features according to an embodiment of the present invention.
As shown in fig. 4, processing the foreground feature of the support video frame by using the prototype generator of the mining module to obtain the dynamic prototype feature includes operations S410 to S440.
In operation S410, global average pooling is performed on the foreground features of the support video frame to obtain video target prototype features.
In operation S420, the prototype generator is used to perform operations on the foreground features of the support video frame and the prototype features of the video target, so as to obtain an attention matrix.
In operation S430, the attention matrix is processed using an optimal transfer algorithm to obtain an optimal allocation matrix.
In operation S440, the foreground feature of the supported video frame and the optimal allocation matrix are operated, and the operation result and the video target prototype feature are operated to obtain a dynamic prototype feature.
The processing process for acquiring the dynamic prototype features can effectively reduce the noise attention in the original video frame image, thereby improving the segmentation performance of the trained model.
According to an embodiment of the present invention, the above-mentioned attention matrix is determined by equation (1):
Figure 378139DEST_PATH_IMAGE001
(1),
wherein the content of the first and second substances,
Figure 749078DEST_PATH_IMAGE046
is the firstiThe foreground feature vectors of each of the support video frames,
Figure 229738DEST_PATH_IMAGE003
index representing foreground feature vector of support video frame, for length of
Figure 787758DEST_PATH_IMAGE004
The foreground feature vector of the supported video frame,
Figure 644855DEST_PATH_IMAGE003
is in the value range of
Figure 604721DEST_PATH_IMAGE005
Figure 521862DEST_PATH_IMAGE006
Is the first
Figure 567178DEST_PATH_IMAGE007
The characteristics of each prototype are characterized in that,
Figure 493546DEST_PATH_IMAGE007
index representing prototype features for
Figure 307918DEST_PATH_IMAGE008
The characteristics of each prototype are characterized in that,
Figure 661539DEST_PATH_IMAGE007
is in the value range of
Figure 928572DEST_PATH_IMAGE009
Figure 658631DEST_PATH_IMAGE010
Is a matrix of support concentration attention forces,
Figure 327510DEST_PATH_IMAGE011
is the first to support the attention-focusing force matrix
Figure 586453DEST_PATH_IMAGE007
Go to the first
Figure 606361DEST_PATH_IMAGE003
Column value for indicating the second
Figure 608952DEST_PATH_IMAGE007
Individual prototype characteristics and
Figure 132338DEST_PATH_IMAGE003
similarity of foreground feature vectors of the individual support video frames;
wherein the dynamic prototype feature is determined by equation (2):
Figure 93340DEST_PATH_IMAGE013
(2),
wherein the content of the first and second substances,
Figure 334966DEST_PATH_IMAGE014
is a sequence of foreground feature vectors supporting video frames,
Figure 875669DEST_PATH_IMAGE015
is the first
Figure 784719DEST_PATH_IMAGE016
Character of individual prototype
Figure 385464DEST_PATH_IMAGE017
The obtained dynamic prototype characteristics are updated according to the dynamic prototype characteristics,
Figure 114386DEST_PATH_IMAGE018
the optimized support set attention matrix is shown, namely the support set attention matrix optimized by using the optimal transmission algorithm,
Figure 724359DEST_PATH_IMAGE019
is the first of the optimized support set attention force matrix
Figure 487915DEST_PATH_IMAGE016
Row vector, representing
Figure 993983DEST_PATH_IMAGE016
The line prototype feature vector pairs support the similarity of the foreground feature vectors of the video frames.
FIG. 5 is a flow diagram of obtaining a low-level correspondence matrix according to an embodiment of the invention.
As shown in fig. 5, the obtaining of the low-level correspondence matrix by using the guidance module of the small sample video object segmentation model to process the low-level features of the support video frame, the low-level features of the inquiry video frame and the correspondence matrix includes operations S510 to S530.
In operation S510, a preset row number and a preset column number of the corresponding relationship matrix are selected to obtain an intermediate corresponding relationship matrix.
In operation S520, the low-level feature of the support video frame and the intermediate correspondence matrix are operated to obtain a reconstructed feature matrix.
In operation S530, the reconstructed feature matrix and the low-level features of the challenge video frame are operated to obtain a low-level corresponding relationship matrix.
According to an embodiment of the present invention, the above guidance module is determined by formula (3) and formula (4):
Figure 475780DEST_PATH_IMAGE020
(3),
Figure 889444DEST_PATH_IMAGE021
(4),
wherein the content of the first and second substances,
Figure 241928DEST_PATH_IMAGE022
as a temperature factor, useIn controlling the degree of smoothing of the output probability distribution,
Figure 450055DEST_PATH_IMAGE023
the length of the modulus of the vector is represented,
Figure 153569DEST_PATH_IMAGE024
is the first
Figure 839765DEST_PATH_IMAGE025
An individual challenge video frame feature vector is generated,
Figure 312335DEST_PATH_IMAGE025
index representing the feature vector of the challenge video frame, for height and width respectively
Figure 691364DEST_PATH_IMAGE026
And
Figure 882174DEST_PATH_IMAGE027
the image of the video frame of the challenge,
Figure 106482DEST_PATH_IMAGE025
is in the value range of
Figure 699137DEST_PATH_IMAGE028
Figure 514646DEST_PATH_IMAGE029
Is an allocation matrix of dynamic prototype features and challenge video frame features,
Figure 192752DEST_PATH_IMAGE047
a first of the allocation matrices representing the dynamic prototype features and the features of the challenge video frame
Figure 955172DEST_PATH_IMAGE007
Go to the first
Figure 402334DEST_PATH_IMAGE025
The value of the column is such that,
Figure 123165DEST_PATH_IMAGE031
an assignment matrix representing the optimized dynamic prototype features and the foreground features of the support video frame,
Figure 22988DEST_PATH_IMAGE032
a correspondence matrix representing the features of the challenge video frame and the foreground features of the support video frame,softmaxrepresenting a normalized exponential function.
Allocation matrices for dynamic prototype features and supporting video frame features
Figure 120257DEST_PATH_IMAGE048
Then it is determined by:
Figure 156346DEST_PATH_IMAGE049
Figure 48079DEST_PATH_IMAGE050
allocation matrix representing dynamic prototype features and supporting video frame features
Figure 700777DEST_PATH_IMAGE051
To (1) a
Figure 336158DEST_PATH_IMAGE016
Go to the first
Figure 492332DEST_PATH_IMAGE039
The value of the column is such that,
Figure 554966DEST_PATH_IMAGE052
is the firstiThe foreground feature vectors of each of the support video frames,
Figure 429381DEST_PATH_IMAGE015
is the first
Figure 602874DEST_PATH_IMAGE016
Is characterized by a prototype
Figure 879134DEST_PATH_IMAGE017
Is updated toTo dynamic prototype features.
According to an embodiment of the present invention, the loss function of the small sample video object segmentation model includes a cross-over ratio loss function and a cross-entropy loss function;
wherein the cross entropy loss function is determined by equation (5):
Figure 378249DEST_PATH_IMAGE033
(5),
wherein the content of the first and second substances,
Figure 474381DEST_PATH_IMAGE034
and
Figure 185985DEST_PATH_IMAGE035
respectively representing the height and width of the incoming challenge video frame image or the support video frame image,
Figure 582331DEST_PATH_IMAGE036
represents the product of said height and said width,
Figure 721188DEST_PATH_IMAGE037
is the result of the true segmentation of the image,
Figure 570196DEST_PATH_IMAGE038
representing the first in the real segmentation result
Figure 351070DEST_PATH_IMAGE039
Go to the first
Figure 336343DEST_PATH_IMAGE040
The value of the column is such that,
Figure 911681DEST_PATH_IMAGE041
is the result of the segmentation predicted by the model,
Figure 247985DEST_PATH_IMAGE053
representing the result of the segmentation predicted by the model
Figure 301391DEST_PATH_IMAGE039
Go to the first
Figure 672330DEST_PATH_IMAGE040
The value of the column;
wherein the cross-over ratio loss function is determined by equation (6):
Figure 418569DEST_PATH_IMAGE044
(6),
where a norm of the matrix is represented.
Since the segmentation task is similar to the pixel-by-pixel classification task, intensive cross-entropy losses are used as constraints, while, in order to improve the final segmentation result
Figure 711010DEST_PATH_IMAGE041
And label mask
Figure 833687DEST_PATH_IMAGE037
The coincidence degree index of the invention is additionally added with an intersection ratio loss, and finally the loss function of the invention is formed by combining the intersection ratio loss function and the cross entropy loss function according to a certain weight coefficient; the loss function of the small sample video object segmentation model of the invention is shown in formula (7):
Figure 793553DEST_PATH_IMAGE054
(7),
wherein the content of the first and second substances,
Figure 445114DEST_PATH_IMAGE055
and
Figure 21589DEST_PATH_IMAGE056
representing the weight coefficients.
By using the loss function as the constraint of the training method, the training effect of the small sample video target model can be improved, and a small sample video target segmentation model which has robustness and effectively reduces noise and is based on dynamic prototype learning is obtained.
Fig. 6 is a frame diagram of a small sample video object segmentation model based on dynamic prototype learning according to an embodiment of the present invention.
The training process of the model provided by the embodiment of the present invention is further described in detail with reference to fig. 6.
As shown in FIG. 6, the model training framework provided by the present invention comprises a dynamic prototype mining module based on an optimal transmission algorithm and a multi-level dynamic boot module. In a dynamic prototype mining module based on an optimal transmission algorithm, for input supporting set and inquiry set images belonging to the same category, multi-level features are extracted through a ResNet-50 network, and then a 1x1 convolutional layer is mapped to a common measurement space. Will support the feature
Figure 694096DEST_PATH_IMAGE057
Flattening and extracting the inclusion using the corresponding mask
Figure 508468DEST_PATH_IMAGE004
A sequence of foreground feature vectors supporting the video frames
Figure 596510DEST_PATH_IMAGE058
Sending it to a prototype generator to obtain
Figure 394702DEST_PATH_IMAGE008
An object prototype, as shown in equations (8) and (9):
Figure 593602DEST_PATH_IMAGE059
(8),
Figure 262480DEST_PATH_IMAGE060
(9),
GAP(Global Average Polean, GAP, global average pooling) is used to average a sequence of input supporting video frame foreground feature vectors,
Figure 787003DEST_PATH_IMAGE061
representing a target global feature vector, prototype generator
Figure 541332DEST_PATH_IMAGE062
Composed of a full connection layer and an activation function, which can generate prototype features according to the target features input by the current support set
Figure 543923DEST_PATH_IMAGE063
Figure 598467DEST_PATH_IMAGE064
Is shown in whichkA generator for generating a prototype of the object,
Figure 28311DEST_PATH_IMAGE065
is composed of
Figure 269937DEST_PATH_IMAGE064
Can be based on the attention moment matrix
Figure 76219DEST_PATH_IMAGE066
Foreground pixel features are assigned to these prototypes as shown in equation (1):
Figure 985269DEST_PATH_IMAGE001
(1),
in order to allocate a group of semantically consistent pixel features to the same prototype, an optimal allocation matrix is obtained based on an optimal transmission theory for adjusting the mapping relationship between the pixel features and the prototype, and this process mainly solves the optimization problem shown in formulas (10) and (11):
Figure 586014DEST_PATH_IMAGE067
(10),
Figure 314936DEST_PATH_IMAGE068
(11),
wherein the content of the first and second substances,
Figure 924909DEST_PATH_IMAGE069
the vector is a vector of all 1 s,
Figure 422886DEST_PATH_IMAGE070
representing the transition matrix to be solved for,
Figure 194533DEST_PATH_IMAGE071
represents the optimal solution of the transition matrix to be solved,
Figure 941909DEST_PATH_IMAGE071
the weighting operation is performed on the attention moment array, and is a weighting matrix, Tr represents the trace operation of the matrix,
Figure 824415DEST_PATH_IMAGE072
which represents a constant coefficient of the constant,
Figure 442478DEST_PATH_IMAGE073
the entropy function of the information is represented,
Figure 385026DEST_PATH_IMAGE074
for transferring matrices
Figure 822961DEST_PATH_IMAGE070
The space of possible solutions of (a) is,
Figure 774736DEST_PATH_IMAGE075
the dimension of expression is
Figure 512885DEST_PATH_IMAGE076
Can ultimately be based on
Figure 626335DEST_PATH_IMAGE071
The updating results in a robust dynamic prototype, as shown in equation (12) and equation (2):
Figure 817144DEST_PATH_IMAGE077
(12),
Figure 307032DEST_PATH_IMAGE078
(2),
wherein the content of the first and second substances,
Figure 899687DEST_PATH_IMAGE079
representing the operation of element-by-element multiplication between matrices.
The above process can optimize the prototype vector through multiple iterations, and simultaneously purify the distribution matrix of the support set
Figure 449617DEST_PATH_IMAGE080
In the multi-level dynamic guiding module, for a video frame of a challenge set to be segmented, a pseudo label can be allocated to each pixel feature by using a self-adaptively generated dynamic prototype, and meanwhile, a huge calculation amount generated in a dense matching process is reduced by using a calculation mode of an intermediate bridge, as shown in formulas (3) and (4):
Figure 862144DEST_PATH_IMAGE020
(3),
Figure 155722DEST_PATH_IMAGE081
(4),
wherein, the first and the second end of the pipe are connected with each other,
Figure 602884DEST_PATH_IMAGE022
is the temperature factor. High-level features at low resolution may use pairsCorrespondence matrix
Figure 323715DEST_PATH_IMAGE082
And (4) reconstructing the characteristics of the support video frame, and inputting the reconstructed support video frame into a decoder to predict a segmentation result. For the low-level features with high resolution, a guiding method which can suppress noise by using a dynamic prototype and has less calculation amount is used for feature reconstruction. The concrete method is according to
Figure 223538DEST_PATH_IMAGE082
Selecting position indexes of similarity from the support set characteristics to obtain corresponding characteristic vectors
Figure 55228DEST_PATH_IMAGE083
Obtaining dense matching results for low-level features in an indirect guided manner
Figure 356896DEST_PATH_IMAGE084
As shown in equation (13):
Figure 248629DEST_PATH_IMAGE085
(13),
wherein the content of the first and second substances,
Figure 635748DEST_PATH_IMAGE086
is to challenge the low-level features of the video frame
Figure 271128DEST_PATH_IMAGE087
To (1)jThe number of feature vectors is determined by the number of feature vectors,
Figure 692882DEST_PATH_IMAGE088
is selected to correspond to
Figure 489937DEST_PATH_IMAGE089
The product of the two forms the low-level feature vector of the support video frame
Figure 364352DEST_PATH_IMAGE090
Representing low-level feature dense matching results
Figure 537845DEST_PATH_IMAGE091
To (1)jA value.
The small sample video target segmentation model obtained through the training process can realize that a small amount of marked images are input as supports to segment targets of the same category in video frames.
Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a small-sample video object segmentation method of base dynamic prototype learning, in accordance with an embodiment of the present invention.
As shown in fig. 7, an electronic device 700 according to an embodiment of the present invention includes a processor 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present invention.
In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. The processor 701 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 702 and/or the RAM 703. It is noted that the programs may also be stored in one or more memories other than the ROM 702 and RAM 703. The processor 701 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.
Electronic device 700 may also include input/output (I/O) interface 705, which input/output (I/O) interface 705 also connects to bus 704, according to an embodiment of the invention. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that the computer program read out therefrom is mounted in the storage section 708 as necessary.
The present invention also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.
According to embodiments of the present invention, the computer readable storage medium may be a non-volatile computer readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, a computer-readable storage medium may include the ROM 702 and/or the RAM 703 and/or one or more memories other than the ROM 702 and the RAM 703 described above.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A small sample video object segmentation method based on dynamic prototype learning comprises the following steps:
acquiring a video target to be segmented;
processing the video target to be segmented by using a small sample video target segmentation model based on dynamic prototype learning to obtain a video target segmentation result, wherein the small sample video target segmentation model based on dynamic prototype learning is obtained by training according to the following method:
processing the video frame images of the challenge set and the video frame images of the support set by utilizing a part of neural network layers of a feature extraction module of the small sample video object segmentation model to obtain low-level features of the challenge video frame and low-level features of the support video frame;
processing the video frame images of the challenge set by using all the neural network layers of the feature extraction module of the small sample video object segmentation model to obtain the features of the challenge video frame;
carrying out mask operation on the low-level features of the support video frame to obtain foreground features of the support video frame;
processing the foreground characteristics of the support video frame and the characteristics of the challenge video frame by utilizing a mining module of the small sample video target segmentation model to obtain a corresponding relation matrix;
processing the low-level features of the support video frame, the low-level features of the challenge video frame and the corresponding relation matrix by utilizing a guide module of the small sample video target segmentation model to obtain a low-level corresponding relation matrix;
processing the corresponding relation matrix and the low-level corresponding relation matrix by utilizing a segmentation module of the small sample video target segmentation model to obtain a video target segmentation result, and optimizing the small sample video target segmentation model by utilizing a loss function of the small sample video target segmentation model;
iteratively performing feature extraction operation, masking operation, mining operation, guiding operation, segmentation operation and optimization operation until the value of the loss function meets a preset condition to obtain a trained small sample video target segmentation model;
wherein, the processing the foreground characteristics of the support video frame and the characteristics of the challenge video frame by the mining module of the small sample video object segmentation model to obtain a corresponding relation matrix comprises:
processing the foreground characteristics of the support video frame by utilizing a prototype generator of the mining module to obtain dynamic prototype characteristics;
calculating the dynamic prototype characteristics and the foreground characteristics of the support video frame to obtain a support corresponding relation matrix;
calculating the dynamic prototype characteristics and the characteristics of the challenge video frame to obtain a challenge corresponding relation matrix;
and operating the support corresponding relation matrix and the inquiry corresponding relation matrix to obtain a corresponding relation matrix.
2. The method of claim 1, wherein said processing the support video frame foreground features with a prototype generator of the mining module to obtain dynamic prototype features comprises:
performing global average pooling on the foreground features of the support video frames to obtain video target prototype features;
calculating the foreground characteristic of the support video frame and the prototype characteristic of the video target by using the prototype generator to obtain an attention matrix;
processing the attention matrix by using an optimal transmission algorithm to obtain an optimal distribution matrix;
and calculating the foreground characteristics of the support video frame and the optimal distribution matrix, and calculating the calculation result and the video target prototype characteristics to obtain dynamic prototype characteristics.
3. The method of claim 2, wherein the attention matrix is determined by equation (1):
Figure DEST_PATH_IMAGE002
(1),
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE004
is the first
Figure DEST_PATH_IMAGE006
The foreground feature vectors of each of the support video frames,
Figure 731579DEST_PATH_IMAGE006
an index representing the foreground feature vector of the support video frame, for a length of
Figure DEST_PATH_IMAGE008
The foreground feature vector of the support video frame,
Figure 711036DEST_PATH_IMAGE006
is in the value range of
Figure DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE012
Is the first
Figure DEST_PATH_IMAGE014
The characteristics of each prototype are characterized in that,
Figure 615407DEST_PATH_IMAGE014
an index representing the prototype feature for
Figure DEST_PATH_IMAGE016
An instituteThe prototype feature is described as being a feature of the prototype,
Figure 138180DEST_PATH_IMAGE014
is in the value range of
Figure DEST_PATH_IMAGE018
Figure DEST_PATH_IMAGE020
Is a matrix of support focus forces,
Figure DEST_PATH_IMAGE022
is the support attention force matrix
Figure DEST_PATH_IMAGE024
To (1)
Figure 139503DEST_PATH_IMAGE014
Go to the first
Figure 961965DEST_PATH_IMAGE006
Column value for indicating the second
Figure 291316DEST_PATH_IMAGE014
The prototype features and
Figure 490216DEST_PATH_IMAGE006
similarity of foreground feature vectors of the support video frames;
wherein the dynamic prototype feature is determined by equation (2):
Figure DEST_PATH_IMAGE026
(2),
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE028
is the support video frame foreground featureThe sequence of the feature vectors is then,
Figure DEST_PATH_IMAGE030
is the first
Figure 348975DEST_PATH_IMAGE014
Character of individual prototype
Figure 342339DEST_PATH_IMAGE012
The obtained dynamic prototype characteristics are updated according to the dynamic prototype characteristics,
Figure DEST_PATH_IMAGE032
representing the optimized support set attention force matrix,
Figure DEST_PATH_IMAGE034
is the first of the optimized support attention force matrix
Figure 221302DEST_PATH_IMAGE014
A row vector.
4. The method of claim 1, wherein the processing the support video frame low-level features, the challenge video frame low-level features, and the correspondence matrix with a bootstrap module of the small sample video object segmentation model to obtain a low-level correspondence matrix comprises:
selecting a preset row number and a preset column number of the corresponding relation matrix to obtain an intermediate corresponding relation matrix;
calculating the low-level characteristics of the support video frame and the intermediate corresponding relation matrix to obtain a reconstructed characteristic matrix;
and operating the reconstructed feature matrix and the low-level features of the inquiry video frame to obtain the low-level corresponding relation matrix.
5. The method of claim 1, wherein the guidance module is determined by formula (3) and formula (4):
Figure DEST_PATH_IMAGE036
(3),
Figure DEST_PATH_IMAGE038
(4),
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE040
is a temperature factor, for controlling the degree of smoothing of the output probability distribution,
Figure DEST_PATH_IMAGE042
the length of the modulus of the vector is represented,
Figure DEST_PATH_IMAGE044
is the first
Figure DEST_PATH_IMAGE046
An individual challenge video frame feature vector is generated,
Figure 207581DEST_PATH_IMAGE046
an index representing the feature vector of the challenge video frame, for height and width respectively
Figure DEST_PATH_IMAGE048
And
Figure DEST_PATH_IMAGE050
of the challenge video frame image of (a),
Figure 527704DEST_PATH_IMAGE046
is in the value range of
Figure DEST_PATH_IMAGE052
Figure DEST_PATH_IMAGE054
Is an allocation matrix of dynamic prototype features and challenge video frame features,
Figure DEST_PATH_IMAGE056
a first of the allocation matrices representing the dynamic prototype features and the features of the challenge video frame
Figure 413008DEST_PATH_IMAGE014
Go to the first
Figure 716950DEST_PATH_IMAGE046
The value of the column is such that,
Figure DEST_PATH_IMAGE058
an assignment matrix representing the optimized dynamic prototype features and the support video frame foreground features,
Figure DEST_PATH_IMAGE060
and the softmax represents a normalized exponential function.
6. The method of claim 1, wherein the loss function of the small sample video object segmentation model comprises a cross-over ratio loss function and a cross-entropy loss function;
wherein the cross entropy loss function is determined by equation (5):
Figure DEST_PATH_IMAGE062
(5),
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE064
and
Figure DEST_PATH_IMAGE066
respectively representing the height and width of the incoming challenge video frame image or the support video frame image,
Figure DEST_PATH_IMAGE068
representing the product of said height and said width,
Figure DEST_PATH_IMAGE070
is the result of the real segmentation,
Figure DEST_PATH_IMAGE072
representing the first in the real segmentation result
Figure DEST_PATH_IMAGE074
Go to the first
Figure DEST_PATH_IMAGE076
The value of the column is such that,
Figure DEST_PATH_IMAGE078
is the result of the segmentation predicted by the model,
Figure DEST_PATH_IMAGE080
representing the result of the segmentation predicted by the model
Figure 91607DEST_PATH_IMAGE074
Go to the first
Figure 125291DEST_PATH_IMAGE076
The value of the column;
wherein the cross-over ratio loss function is determined by equation (6):
Figure DEST_PATH_IMAGE082
(6),
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE084
representing a norm of the matrix.
7. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.
8. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 6.
CN202210536170.6A 2022-05-18 2022-05-18 Small sample video target segmentation method based on dynamic prototype learning Active CN114638839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210536170.6A CN114638839B (en) 2022-05-18 2022-05-18 Small sample video target segmentation method based on dynamic prototype learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210536170.6A CN114638839B (en) 2022-05-18 2022-05-18 Small sample video target segmentation method based on dynamic prototype learning

Publications (2)

Publication Number Publication Date
CN114638839A CN114638839A (en) 2022-06-17
CN114638839B true CN114638839B (en) 2022-09-30

Family

ID=81953301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210536170.6A Active CN114638839B (en) 2022-05-18 2022-05-18 Small sample video target segmentation method based on dynamic prototype learning

Country Status (1)

Country Link
CN (1) CN114638839B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942463A (en) * 2019-10-30 2020-03-31 杭州电子科技大学 Video target segmentation method based on generation countermeasure network
CN111210446A (en) * 2020-01-08 2020-05-29 中国科学技术大学 Video target segmentation method, device and equipment
CN113177549A (en) * 2021-05-11 2021-07-27 中国科学技术大学 Few-sample target detection method and system based on dynamic prototype feature fusion
CN113240039A (en) * 2021-05-31 2021-08-10 西安电子科技大学 Small sample target detection method and system based on spatial position characteristic reweighting
CN113706487A (en) * 2021-08-17 2021-11-26 西安电子科技大学 Multi-organ segmentation method based on self-supervision characteristic small sample learning
CN113763385A (en) * 2021-05-28 2021-12-07 华南理工大学 Video object segmentation method, device, equipment and medium
CN113920127A (en) * 2021-10-27 2022-01-11 华南理工大学 Single sample image segmentation method and system with independent training data set
EP3961502A1 (en) * 2020-08-31 2022-03-02 Sap Se Weakly supervised one-shot image segmentation
CN114266977A (en) * 2021-12-27 2022-04-01 青岛澎湃海洋探索技术有限公司 Multi-AUV underwater target identification method based on super-resolution selectable network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11556666B2 (en) * 2018-10-16 2023-01-17 Immuta, Inc. Data access policy management
CN111583284B (en) * 2020-04-22 2021-06-22 中国科学院大学 Small sample image semantic segmentation method based on hybrid model
CN114240965A (en) * 2021-12-13 2022-03-25 江南大学 Small sample learning tumor segmentation method driven by graph attention model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942463A (en) * 2019-10-30 2020-03-31 杭州电子科技大学 Video target segmentation method based on generation countermeasure network
CN111210446A (en) * 2020-01-08 2020-05-29 中国科学技术大学 Video target segmentation method, device and equipment
EP3961502A1 (en) * 2020-08-31 2022-03-02 Sap Se Weakly supervised one-shot image segmentation
CN113177549A (en) * 2021-05-11 2021-07-27 中国科学技术大学 Few-sample target detection method and system based on dynamic prototype feature fusion
CN113763385A (en) * 2021-05-28 2021-12-07 华南理工大学 Video object segmentation method, device, equipment and medium
CN113240039A (en) * 2021-05-31 2021-08-10 西安电子科技大学 Small sample target detection method and system based on spatial position characteristic reweighting
CN113706487A (en) * 2021-08-17 2021-11-26 西安电子科技大学 Multi-organ segmentation method based on self-supervision characteristic small sample learning
CN113920127A (en) * 2021-10-27 2022-01-11 华南理工大学 Single sample image segmentation method and system with independent training data set
CN114266977A (en) * 2021-12-27 2022-04-01 青岛澎湃海洋探索技术有限公司 Multi-AUV underwater target identification method based on super-resolution selectable network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation;Jie Liu 等;《ICCV 2021 open access》;20220303;全文 *
Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition;Jiamin Wu 等;《ICCV 2021 open access》;20220303;全文 *
Uncertainty-Aware Semi-Supervised Few Shot Segmentation;Soopil Kim 等;《https://arxiv.org/abs/2110.08954》;20211018;全文 *
金字塔原型对齐的轻量级小样本语义分割网络;贾熹滨 等;《北京工业大学学报》;20210528;第47卷(第5期);全文 *

Also Published As

Publication number Publication date
CN114638839A (en) 2022-06-17

Similar Documents

Publication Publication Date Title
Li et al. Box-supervised instance segmentation with level set evolution
US11030750B2 (en) Multi-level convolutional LSTM model for the segmentation of MR images
CN112668579A (en) Weak supervision semantic segmentation method based on self-adaptive affinity and class distribution
CN112016512A (en) Remote sensing image small target detection method based on feedback type multi-scale training
CN114677515B (en) Weak supervision semantic segmentation method based on similarity between classes
CN113128478A (en) Model training method, pedestrian analysis method, device, equipment and storage medium
CN113780326A (en) Image processing method and device, storage medium and electronic equipment
CN116982089A (en) Method and system for image semantic enhancement
CN112668608A (en) Image identification method and device, electronic equipment and storage medium
CN114170558A (en) Method, system, device, medium and article for video processing
CN117437423A (en) Weak supervision medical image segmentation method and device based on SAM collaborative learning and cross-layer feature aggregation enhancement
CN111476226B (en) Text positioning method and device and model training method
CN112907750A (en) Indoor scene layout estimation method and system based on convolutional neural network
CN114638839B (en) Small sample video target segmentation method based on dynamic prototype learning
CN117011515A (en) Interactive image segmentation model based on attention mechanism and segmentation method thereof
CN115082778B (en) Multi-branch learning-based homestead identification method and system
CN114741697B (en) Malicious code classification method and device, electronic equipment and medium
CN113807354B (en) Image semantic segmentation method, device, equipment and storage medium
CN112287144B (en) Picture retrieval method, equipment and storage medium
CN115049546A (en) Sample data processing method and device, electronic equipment and storage medium
CN115222750A (en) Remote sensing image segmentation method and system based on multi-scale fusion attention
CN114937187A (en) Image optimization method, device, equipment and storage medium
CN111062477B (en) Data processing method, device and storage medium
CN113610016A (en) Training method, system, equipment and storage medium of video frame feature extraction model
CN112861940A (en) Binocular disparity estimation method, model training method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant