CN114638839A - Small sample video target segmentation method based on dynamic prototype learning - Google Patents

Small sample video target segmentation method based on dynamic prototype learning Download PDF

Info

Publication number
CN114638839A
CN114638839A CN202210536170.6A CN202210536170A CN114638839A CN 114638839 A CN114638839 A CN 114638839A CN 202210536170 A CN202210536170 A CN 202210536170A CN 114638839 A CN114638839 A CN 114638839A
Authority
CN
China
Prior art keywords
video frame
matrix
prototype
features
support
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210536170.6A
Other languages
Chinese (zh)
Other versions
CN114638839B (en
Inventor
张天柱
张哲�
张勇东
罗乃淞
吴枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202210536170.6A priority Critical patent/CN114638839B/en
Publication of CN114638839A publication Critical patent/CN114638839A/en
Application granted granted Critical
Publication of CN114638839B publication Critical patent/CN114638839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small sample video target segmentation method based on dynamic prototype learning, which comprises the following steps: acquiring a video target to be segmented; and processing the video target to be segmented by using a small sample video target segmentation model based on dynamic prototype learning to obtain a video target segmentation result. According to the small sample video target segmentation method, the optimal transmission method is used for self-adaptive learning of the dynamic prototype, noise attention is effectively reduced, and meanwhile, a guiding mode is adopted for matching the multi-level feature map, so that the calculated amount is greatly reduced; the method can fully extract the target information in a small number of support set samples, and remarkably improves the segmentation performance on the video of the challenge set. The invention also discloses electronic equipment, a storage medium and a computer program product for executing the small sample video object segmentation method based on the dynamic prototype learning.

Description

Small sample video target segmentation method based on dynamic prototype learning
Technical Field
The invention relates to the field of computer vision, in particular to a training method of a small sample video target segmentation model and a video target segmentation method.
Background
Video target segmentation is a technology for predicting a foreground target mask in each frame of a video, and has wide application in the aspects of augmented reality, automatic driving, video editing and the like.
The prior art segmentation of video objects is typically based on semi-supervised and unsupervised. The semi-supervision method needs to give target information of a first frame of each video, then carries out dense association of targets in subsequent frames of the video, and the process seriously depends on a large amount of densely-segmented and labeled data, so that time and labor are consumed; the unsupervised method has low performance due to the lack of the labeling data, and cannot meet the requirements of practical application. In addition, the two methods cannot be well generalized to new target classes, and the segmentation capability on classes not seen in the training phase is sharply reduced, which limits the expansibility and practicability of video target recognition.
Disclosure of Invention
In view of the above, it is a primary object of the present invention to provide a small sample video object segmentation method based on dynamic prototype learning, an electronic device, a storage medium and a computer program product, which are intended to at least partially solve at least one of the above-mentioned technical problems.
According to a first aspect of the present invention, there is provided a small sample video object segmentation method based on dynamic prototype learning, including:
acquiring a video target to be segmented;
processing a video target to be segmented by using a small sample video target segmentation model based on dynamic prototype learning to obtain a video target segmentation result, wherein the small sample video target segmentation model based on dynamic prototype learning is obtained by training according to the following method:
processing the video frame images of the challenge set and the video frame images of the support set by utilizing a part of neural network layers of a feature extraction module of a small sample video object segmentation model to obtain low-level features of the challenge video frame and low-level features of the support video frame;
processing the video frame images of the challenge set by using all neural network layers of a feature extraction module of the small sample video object segmentation model to obtain the features of the challenge video frame;
carrying out mask operation on the low-level features of the support video frame to obtain foreground features of the support video frame;
processing foreground characteristics of the support video frame and characteristics of the challenge video frame by utilizing a mining module of a small sample video object segmentation model to obtain a corresponding relation matrix;
processing the low-level features of the support video frames, the low-level features of the inquiry video frames and the corresponding relation matrix by using a guide module of the small sample video target segmentation model to obtain a low-level corresponding relation matrix;
processing the corresponding relation matrix and the low-level corresponding relation matrix by using a segmentation module of the small sample video target segmentation model to obtain a video target segmentation result, and optimizing the small sample video target segmentation model by using a loss function of the small sample video target segmentation model;
and (4) iterating to perform feature extraction operation, masking operation, mining operation, guiding operation, segmentation operation and optimization operation until the value of the loss function meets a preset condition, so as to obtain a trained small sample video target segmentation model.
According to the embodiment of the present invention, the processing of the foreground feature of the support video frame and the feature of the challenge video frame by the mining module of the small sample video object segmentation model to obtain the corresponding relationship matrix includes:
processing foreground characteristics of the support video frame by utilizing a prototype generator of the mining module to obtain dynamic prototype characteristics;
calculating the dynamic prototype characteristics and the foreground characteristics of the support video frame to obtain a support corresponding relation matrix;
calculating the dynamic prototype characteristics and the characteristics of the inquiry video frame to obtain an inquiry corresponding relation matrix;
and calculating the support corresponding relation matrix and the inquiry corresponding relation matrix to obtain a corresponding relation matrix.
According to an embodiment of the present invention, the processing of the foreground feature of the support video frame by the prototype generator of the mining module to obtain the dynamic prototype feature includes:
carrying out global average pooling on foreground features of the support video frame to obtain video target prototype features;
calculating foreground characteristics of the support video frame and prototype characteristics of the video target by using a prototype generator to obtain an attention matrix;
processing the attention matrix by using an optimal transmission algorithm to obtain an optimal distribution matrix;
and calculating the foreground characteristics of the support video frame and the optimal distribution matrix, and calculating the calculation result and the video target prototype characteristics to obtain the dynamic prototype characteristics.
According to an embodiment of the present invention, the above-mentioned attention matrix is determined by equation (1):
Figure 884731DEST_PATH_IMAGE001
(1),
wherein,
Figure 67451DEST_PATH_IMAGE002
is the first
Figure 112768DEST_PATH_IMAGE003
The foreground feature vectors of each of the support video frames,
Figure 39135DEST_PATH_IMAGE003
index representing foreground feature vector of support video frame, for length of
Figure 853507DEST_PATH_IMAGE004
The foreground feature vector of the supported video frame,
Figure 941549DEST_PATH_IMAGE003
is in the value range of
Figure 474162DEST_PATH_IMAGE005
Figure 938641DEST_PATH_IMAGE006
Is the first
Figure 607520DEST_PATH_IMAGE007
The characteristics of each prototype are characterized in that,
Figure 132042DEST_PATH_IMAGE007
index representing prototype features, for
Figure 151951DEST_PATH_IMAGE008
The characteristics of each prototype are characterized in that,
Figure 888963DEST_PATH_IMAGE007
is in the value range of
Figure 677927DEST_PATH_IMAGE009
Figure 638930DEST_PATH_IMAGE010
Is a matrix of support focus forces,
Figure 146134DEST_PATH_IMAGE011
is a support attention force matrix
Figure 686837DEST_PATH_IMAGE012
To (1) a
Figure 595887DEST_PATH_IMAGE007
Go to the first
Figure 196633DEST_PATH_IMAGE003
Column value for indicating the second
Figure 191134DEST_PATH_IMAGE007
Individual prototype characteristics and
Figure 269948DEST_PATH_IMAGE003
similarity of foreground feature vectors of the individual support video frames;
wherein the dynamic prototype feature is determined by equation (2):
Figure 33505DEST_PATH_IMAGE013
(2),
wherein,
Figure 70731DEST_PATH_IMAGE014
is a sequence of foreground feature vectors supporting video frames,
Figure 286949DEST_PATH_IMAGE015
is the first
Figure 435033DEST_PATH_IMAGE016
Character of individual prototype
Figure 53096DEST_PATH_IMAGE017
The obtained dynamic prototype characteristics are updated according to the dynamic prototype characteristics,
Figure 261224DEST_PATH_IMAGE018
representing the optimized support set attention force matrix,
Figure 964738DEST_PATH_IMAGE019
is the first of the optimized support attention force matrix
Figure 650934DEST_PATH_IMAGE016
A row vector.
According to an embodiment of the present invention, the processing, by the guidance module of the small sample video object segmentation model, the low-level feature of the support video frame, the low-level feature of the inquiry video frame, and the correspondence matrix to obtain the low-level correspondence matrix includes:
selecting a preset row number and a preset column number of the corresponding relation matrix to obtain an intermediate corresponding relation matrix;
calculating the low-level feature of the support video frame and the intermediate corresponding relation matrix to obtain a reconstructed feature matrix;
and performing operation on the reconstructed feature matrix and the low-level features of the inquiry video frame to obtain a low-level corresponding relation matrix.
According to an embodiment of the present invention, the above guidance module is determined by formula (3) and formula (4):
Figure 389083DEST_PATH_IMAGE020
(3),
Figure 768112DEST_PATH_IMAGE021
(4),
wherein,
Figure 958921DEST_PATH_IMAGE022
is a temperature factor, for controlling the degree of smoothing of the output probability distribution,
Figure 448809DEST_PATH_IMAGE023
the length of the modulus of the vector is represented,
Figure 775885DEST_PATH_IMAGE024
is the first
Figure 325815DEST_PATH_IMAGE025
An individual challenge video frame feature vector is generated,
Figure 738342DEST_PATH_IMAGE025
index representing the feature vector of the challenge video frame, for height and width respectively
Figure 31920DEST_PATH_IMAGE026
And
Figure 479082DEST_PATH_IMAGE027
the image of the video frame of the challenge,
Figure 934334DEST_PATH_IMAGE025
is in the value range of
Figure 99736DEST_PATH_IMAGE028
Figure 931426DEST_PATH_IMAGE029
Is an allocation matrix of dynamic prototype features and challenge video frame features,
Figure 498673DEST_PATH_IMAGE030
first of the allocation matrix representing dynamic prototype features and challenging video frame features
Figure 124827DEST_PATH_IMAGE007
Go to the first
Figure 777525DEST_PATH_IMAGE025
The value of the column is such that,
Figure 412905DEST_PATH_IMAGE031
an assignment matrix representing the optimized dynamic prototype features and the foreground features of the support video frame,
Figure 303501DEST_PATH_IMAGE032
a correspondence matrix representing the features of the challenge video frame and the foreground features of the support video frame,softmaxrepresenting a normalized exponential function.
According to an embodiment of the present invention, the loss function of the small sample video object segmentation model includes a cross-over ratio loss function and a cross-entropy loss function;
wherein the cross entropy loss function is determined by equation (5):
Figure 366135DEST_PATH_IMAGE033
(5),
wherein,
Figure 240550DEST_PATH_IMAGE034
and
Figure 679622DEST_PATH_IMAGE035
representing the height and width of the incoming challenge video frame image or support video frame image respectively,
Figure 955882DEST_PATH_IMAGE036
which represents the product of the height and the width,
Figure 189417DEST_PATH_IMAGE037
is the result of the real segmentation,
Figure 551129DEST_PATH_IMAGE038
representing the second in the real segmentation result
Figure 528312DEST_PATH_IMAGE039
Go to the first
Figure 659079DEST_PATH_IMAGE040
The value of the column is such that,
Figure 797936DEST_PATH_IMAGE041
is the result of the segmentation predicted by the model,
Figure 381364DEST_PATH_IMAGE043
representing the segmentation result of the model prediction
Figure 896659DEST_PATH_IMAGE039
Go to the first
Figure 413091DEST_PATH_IMAGE040
The value of the column;
wherein the cross-over ratio loss function is determined by equation (6):
Figure 722850DEST_PATH_IMAGE044
(6),
wherein,
Figure 59153DEST_PATH_IMAGE045
representing a norm of the matrix.
According to a second aspect of the present invention, there is provided an electronic apparatus comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above dynamic prototype learning-based small sample video object segmentation method.
According to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-mentioned small-sample video object segmentation method based on dynamic prototype learning.
The small sample video target segmentation method based on dynamic prototype learning provided by the invention has the advantages that the optimal transmission method is used for adaptively learning the dynamic prototype, the noise attention is effectively reduced, meanwhile, the multi-level feature maps are matched in a guiding mode, the calculated amount is greatly reduced, meanwhile, the video segmentation method provided by the invention fully extracts the target information in a small number of support set samples, and the segmentation performance on the video of an inquiry set is obviously improved.
Drawings
FIG. 1 is a flow chart of a small sample video object segmentation method based on dynamic prototype learning according to an embodiment of the present invention;
FIG. 2 is a flowchart of a training method of a small sample video object segmentation model based on dynamic prototype learning according to an embodiment of the present invention;
FIG. 3 is a flow chart of obtaining a correspondence matrix according to an embodiment of the present invention;
FIG. 4 is a flow diagram for obtaining dynamic prototype features according to an embodiment of the present invention;
FIG. 5 is a flow diagram of obtaining a low-level correspondence matrix according to an embodiment of the invention;
FIG. 6 is a small sample video object segmentation model framework diagram based on dynamic prototype learning according to an embodiment of the present invention;
fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a small-sample video object segmentation method of base dynamic prototype learning, in accordance with an embodiment of the present invention.
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
The invention provides a small sample video target segmentation method based on dynamic prototype learning, which aims to reduce the dependence on data, improve the expansibility and the practicability and achieve better video target segmentation performance by using a small amount of data with labels.
In the current method, the method of dense matching by using multi-level features achieves the leading performance. However, dense matching of pixel-by-pixel features introduces a large amount of correspondence noise, and further processing on multiple scales increases the computational load. The method provided by the invention can learn the target prototype in a self-adaptive manner, realize robust multi-level dense matching in a mode of an intermediate bridge, and effectively alleviate the problems of noise and large calculation amount.
The video segmentation method provided by the invention can be applied to an application system related to video object segmentation; the target in the input video is segmented according to the information provided by a small number of support set images, and the method can be widely applied to scenes such as augmented reality, automatic driving, video editing and the like. In a specific embodiment, the method can be embedded into a mobile device in a software form, and provides a real-time segmentation result of a recorded video; or the method can be installed in a background server to provide a processing result of a large batch of videos.
Fig. 1 is a flowchart of a small sample video object segmentation method based on dynamic prototype learning according to an embodiment of the present invention.
As shown in FIG. 1, the method includes operations S110 to S120.
In operation S110, a video object to be segmented is acquired.
In operation S120, a video object to be segmented is processed using a small sample video object segmentation model based on dynamic prototype learning, and a video object segmentation result is obtained.
Fig. 2 is a flowchart of a training method of a small sample video object segmentation model based on dynamic prototype learning according to an embodiment of the present invention.
As shown in FIG. 2, the method includes operations S210 to S270.
In operation S210, the partial neural network layer of the feature extraction module of the small sample video object segmentation model is used to process the video frame images of the challenge set and the video frame images of the support set, so as to obtain low-level features of the challenge video frame and low-level features of the support video frame.
The low-level features are processed by a part of neural network layers of the feature extraction module, so that the resolution is high, more detail information is included, but the low-level features are lower in semantic and more in noise. High-level features (or features) as opposed to low-level features, which traverse more layers of the neural network than low-level features, have stronger semantic information, but have a lower resolution and a poorer perception of detail.
In operation S220, all the neural network layers of the feature extraction module of the small sample video object segmentation model are used to process the video frame images of the challenge set, so as to obtain the features of the challenge video frame.
For input support set video frame images and inquiry set video frame images belonging to the same category, the feature extraction module is utilized to perform multi-level feature extraction based on a ResNet-50 network, and then a 1x1 convolutional layer is mapped to a common measurement space.
In operation S230, a mask operation is performed on the low-level features of the support video frame to obtain foreground features of the support video frame.
In operation S240, the mining module of the small sample video object segmentation model is used to process the foreground feature of the support video frame and the feature of the challenge video frame, so as to obtain a corresponding relationship matrix.
In operation S250, the low-level feature of the support video frame, the low-level feature of the challenge video frame, and the corresponding relationship matrix are processed by using the guidance module of the small-sample video object segmentation model to obtain a low-level corresponding relationship matrix.
In operation S260, the correspondence matrix and the low-level correspondence matrix are processed by using a segmentation module of the small-sample video object segmentation model to obtain a video object segmentation result, and the small-sample video object segmentation model is optimized by using a loss function of the small-sample video object segmentation model.
In operation S270, the feature extraction operation, the masking operation, the mining operation, the guiding operation, the segmentation operation, and the optimization operation are performed iteratively until the value of the loss function satisfies a preset condition, so as to obtain a trained small sample video object segmentation model.
According to the training method provided by the invention, a small sample video target segmentation model with reliability, generalization and high efficiency can be obtained by utilizing a dynamic prototype mining module and a multi-level dynamic guiding module based on an optimal transmission algorithm; the trained small-sample video target segmentation model is used for segmenting a video target, an optimal transmission method can be used for adaptively learning a dynamic prototype, noise attention is effectively reduced, meanwhile, a multi-level feature map is matched in a guiding mode, the calculated amount is greatly reduced, meanwhile, the video segmentation method provided by the invention fully extracts target information in a small number of support set samples, and the segmentation performance on an inquiry set video is remarkably improved.
Fig. 3 is a flowchart of obtaining a correspondence matrix according to an embodiment of the present invention.
As shown in fig. 3, the mining module of the small sample video object segmentation model is used for processing foreground features of the support video frame and features of the challenge video frame, and obtaining a corresponding relation matrix includes operations S310 to S340.
In operation S310, the foreground features of the support video frame are processed by a prototype generator of the mining module to obtain dynamic prototype features.
In operation S320, the dynamic prototype feature and the foreground feature of the support video frame are calculated to obtain a support correspondence matrix.
In operation S330, the dynamic prototype feature and the challenge video frame feature are operated to obtain a challenge correspondence matrix.
In operation S340, the support correspondence matrix and the challenge correspondence matrix are operated to obtain a correspondence matrix.
In the process of acquiring the corresponding relation matrix, the mining module of the dynamic prototype based on the optimal transmission algorithm is utilized, so that the characteristic points which support the incidence relation between the foreground characteristic of the video frame and the characteristic of the video frame to be inquired can be fully mined, and more firm data support is provided for the training of a subsequent model.
FIG. 4 is a flow diagram for obtaining dynamic prototype features according to an embodiment of the present invention.
As shown in fig. 4, the processing of the foreground feature of the support video frame by using the prototype generator of the mining module to obtain the dynamic prototype feature includes operations S410 to S440.
In operation S410, global average pooling is performed on the foreground features of the support video frame to obtain video target prototype features.
In operation S420, the prototype generator is used to perform operations on the foreground features of the support video frame and the prototype features of the video target, so as to obtain an attention matrix.
In operation S430, the attention matrix is processed using an optimal transfer algorithm to obtain an optimal allocation matrix.
In operation S440, the foreground features of the supported video frame and the optimal allocation matrix are calculated, and the calculation result and the video target prototype features are calculated to obtain dynamic prototype features.
The processing process for acquiring the dynamic prototype features can effectively reduce the noise attention in the original video frame image, thereby improving the segmentation performance of the trained model.
According to an embodiment of the present invention, the above-mentioned attention matrix is determined by equation (1):
Figure 378139DEST_PATH_IMAGE001
(1),
wherein,
Figure 749078DEST_PATH_IMAGE046
is the firstiThe foreground feature vector of each support video frame,
Figure 229738DEST_PATH_IMAGE003
index representing foreground feature vector of support video frame, for length of
Figure 787758DEST_PATH_IMAGE004
The foreground feature vector of the supported video frame,
Figure 644855DEST_PATH_IMAGE003
is in the value range of
Figure 604721DEST_PATH_IMAGE005
Figure 521862DEST_PATH_IMAGE006
Is the first
Figure 567178DEST_PATH_IMAGE007
The characteristics of each prototype are characterized in that,
Figure 493546DEST_PATH_IMAGE007
index representing prototype features for
Figure 307918DEST_PATH_IMAGE008
The characteristics of each prototype are shown in the figure,
Figure 661539DEST_PATH_IMAGE007
is in the value range of
Figure 928572DEST_PATH_IMAGE009
Figure 658631DEST_PATH_IMAGE010
Is a supportWith attention to the force matrix, the user can be aware of,
Figure 327510DEST_PATH_IMAGE011
is the first to support the focused force matrix
Figure 586453DEST_PATH_IMAGE007
Go to the first
Figure 606361DEST_PATH_IMAGE003
Column value for indicating the second
Figure 608952DEST_PATH_IMAGE007
Individual prototype features and
Figure 132338DEST_PATH_IMAGE003
similarity of foreground feature vectors of the support video frames;
wherein the dynamic prototype feature is determined by equation (2):
Figure 93340DEST_PATH_IMAGE013
(2),
wherein,
Figure 334966DEST_PATH_IMAGE014
is a sequence of foreground feature vectors that support video frames,
Figure 875669DEST_PATH_IMAGE015
is the first
Figure 784719DEST_PATH_IMAGE016
Is characterized by a prototype
Figure 385464DEST_PATH_IMAGE017
The obtained dynamic prototype characteristics are updated according to the dynamic prototype characteristics,
Figure 114386DEST_PATH_IMAGE018
the optimized support set attention matrix is shown, namely the support set attention matrix optimized by using the optimal transmission algorithm,
Figure 724359DEST_PATH_IMAGE019
is the first of the optimized support attention force matrix
Figure 487915DEST_PATH_IMAGE016
Row vector, representing
Figure 993983DEST_PATH_IMAGE016
The line prototype feature vector pairs support the similarity of the foreground feature vectors of the video frames.
Fig. 5 is a flow chart of obtaining a low-level correspondence matrix according to an embodiment of the invention.
As shown in fig. 5, the method for obtaining the low-level correspondence matrix includes operations S510 to S530, in which the guiding module of the small sample video object segmentation model is used to process the low-level features of the support video frame, the low-level features of the inquiry video frame, and the correspondence matrix.
In operation S510, a preset row number and a preset column number of the corresponding relationship matrix are selected to obtain an intermediate corresponding relationship matrix.
In operation S520, the low-level feature of the support video frame and the intermediate corresponding relationship matrix are operated to obtain a reconstructed feature matrix.
In operation S530, the reconstructed feature matrix and the low-level features of the challenge video frame are operated to obtain a low-level corresponding relationship matrix.
According to an embodiment of the present invention, the above guidance module is determined by formula (3) and formula (4):
Figure 475780DEST_PATH_IMAGE020
(3),
Figure 889444DEST_PATH_IMAGE021
(4),
wherein,
Figure 241928DEST_PATH_IMAGE022
is a temperature factor, for controlling the degree of smoothing of the output probability distribution,
Figure 450055DEST_PATH_IMAGE023
the length of the modulus of the vector is represented,
Figure 153569DEST_PATH_IMAGE024
is the first
Figure 839765DEST_PATH_IMAGE025
An individual challenge video frame feature vector is generated,
Figure 312335DEST_PATH_IMAGE025
index representing the feature vector of the challenge video frame, for height and width respectively
Figure 691364DEST_PATH_IMAGE026
And
Figure 882174DEST_PATH_IMAGE027
the challenge video frame image of (2) is,
Figure 106482DEST_PATH_IMAGE025
is in the value range of
Figure 699137DEST_PATH_IMAGE028
Figure 514646DEST_PATH_IMAGE029
Is an allocation matrix of dynamic prototype features and challenge video frame features,
Figure 192752DEST_PATH_IMAGE047
a first of the allocation matrices representing the dynamic prototype features and the features of the challenge video frame
Figure 955172DEST_PATH_IMAGE007
Go to the first
Figure 402334DEST_PATH_IMAGE025
The value of the column is such that,
Figure 123165DEST_PATH_IMAGE031
an assignment matrix representing the optimized dynamic prototype features and the foreground features of the support video frame,
Figure 22988DEST_PATH_IMAGE032
a correspondence matrix representing the features of the challenge video frame and the foreground features of the support video frame,softmaxrepresenting a normalized exponential function.
Allocation matrix for dynamic prototype features and supporting video frame features
Figure 120257DEST_PATH_IMAGE048
Then it is determined by:
Figure 156346DEST_PATH_IMAGE049
Figure 48079DEST_PATH_IMAGE050
allocation matrix representing dynamic prototype features and supporting video frame features
Figure 700777DEST_PATH_IMAGE051
To (1) a
Figure 336158DEST_PATH_IMAGE016
Go to the first
Figure 492332DEST_PATH_IMAGE039
The value of the column is such that,
Figure 554966DEST_PATH_IMAGE052
is the firstiThe foreground feature vector of each support video frame,
Figure 429381DEST_PATH_IMAGE015
is the first
Figure 602874DEST_PATH_IMAGE016
Character of individual prototype
Figure 879134DEST_PATH_IMAGE017
And updating the obtained dynamic prototype features.
According to an embodiment of the present invention, the loss function of the small sample video object segmentation model includes a cross-over ratio loss function and a cross-entropy loss function;
wherein the cross entropy loss function is determined by equation (5):
Figure 378249DEST_PATH_IMAGE033
(5),
wherein,
Figure 474381DEST_PATH_IMAGE034
and
Figure 185985DEST_PATH_IMAGE035
representing the height and width of the incoming challenge video frame image or the support video frame image respectively,
Figure 582331DEST_PATH_IMAGE036
representing the product of said height and said width,
Figure 721188DEST_PATH_IMAGE037
is the result of the real segmentation,
Figure 570196DEST_PATH_IMAGE038
representing the first of the real segmentation results
Figure 351070DEST_PATH_IMAGE039
Go to the first
Figure 336343DEST_PATH_IMAGE040
The value of the column is such that,
Figure 911681DEST_PATH_IMAGE041
is the result of the segmentation predicted by the model,
Figure 247985DEST_PATH_IMAGE053
representing said mouldType prediction of the segmentation result
Figure 301391DEST_PATH_IMAGE039
Go to the first
Figure 672330DEST_PATH_IMAGE040
The value of the column;
wherein the cross-over ratio loss function is determined by equation (6):
Figure 418569DEST_PATH_IMAGE044
(6),
where a norm of the matrix is represented.
Since the segmentation task is similar to the pixel-by-pixel classification task, intensive cross-entropy losses are used as constraints, while, in order to improve the final segmentation result
Figure 711010DEST_PATH_IMAGE041
And label mask
Figure 833687DEST_PATH_IMAGE037
The coincidence degree index of the invention is additionally added with an intersection ratio loss, and finally the loss function of the invention is formed by combining the intersection ratio loss function and the cross entropy loss function according to a certain weight coefficient; the loss function of the small sample video object segmentation model of the invention is shown in formula (7):
Figure 793553DEST_PATH_IMAGE054
(7),
wherein,
Figure 445114DEST_PATH_IMAGE055
and
Figure 21589DEST_PATH_IMAGE056
representing the weight coefficients.
By using the loss function as the constraint of the training method, the training effect of the small sample video target model can be improved, and the small sample video target segmentation model which has robustness and effectively reduces noise and is based on dynamic prototype learning is obtained.
Fig. 6 is a small sample video object segmentation model framework diagram based on dynamic prototype learning according to an embodiment of the present invention.
The training process of the model provided by the embodiment of the present invention is further described in detail with reference to fig. 6.
As shown in FIG. 6, the model training framework provided by the present invention comprises a dynamic prototype mining module based on an optimal transmission algorithm and a multi-level dynamic guiding module. In a dynamic prototype mining module based on an optimal transmission algorithm, for input supporting set and inquiry set images belonging to the same category, multi-level features are extracted through a ResNet-50 network, and then a 1x1 convolutional layer is mapped to a common measurement space. Will support the feature
Figure 694096DEST_PATH_IMAGE057
Flattening and extracting the mask including
Figure 508468DEST_PATH_IMAGE004
A sequence of foreground feature vectors supporting the video frames
Figure 596510DEST_PATH_IMAGE058
Sending it to a prototype generator to obtain
Figure 394702DEST_PATH_IMAGE008
An object prototype, as shown in equations (8) and (9):
Figure 593602DEST_PATH_IMAGE059
(8),
Figure 262480DEST_PATH_IMAGE060
(9),
GAP (Global Average Pooling) is used to Average a sequence of input supporting video frame foreground feature vectors,
Figure 787003DEST_PATH_IMAGE061
representing target global feature vectors, prototype generator
Figure 541332DEST_PATH_IMAGE062
Composed of a full connection layer and an activation function, which can generate prototype features according to the target features input by the current support set
Figure 543923DEST_PATH_IMAGE063
Figure 598467DEST_PATH_IMAGE064
Is shown in whichkThe number of prototype generators is determined by the number of prototype generators,
Figure 28311DEST_PATH_IMAGE065
is composed of
Figure 269937DEST_PATH_IMAGE064
Can be based on the attention moment matrix
Figure 76219DEST_PATH_IMAGE066
Foreground pixel features are assigned to these prototypes as shown in equation (1):
Figure 985269DEST_PATH_IMAGE001
(1),
in order to allocate a group of semantically consistent pixel features to the same prototype, an optimal allocation matrix is obtained based on an optimal transmission theory for adjusting the mapping relationship between the pixel features and the prototype, and this process mainly solves the optimization problem shown in formulas (10) and (11):
Figure 586014DEST_PATH_IMAGE067
(10),
Figure 314936DEST_PATH_IMAGE068
(11),
wherein,
Figure 924909DEST_PATH_IMAGE069
the vector is a vector of all 1 s,
Figure 422886DEST_PATH_IMAGE070
representing the transition matrix to be solved for,
Figure 194533DEST_PATH_IMAGE071
represents the optimal solution of the transition matrix to be solved,
Figure 941909DEST_PATH_IMAGE071
the weighting operation is carried out on the attention moment array, and is a weighting matrix, Tr represents the trace operation of the matrix,
Figure 824415DEST_PATH_IMAGE072
which is a coefficient of a constant number of times,
Figure 442478DEST_PATH_IMAGE073
the entropy function of the information is represented,
Figure 385026DEST_PATH_IMAGE074
for transferring matrices
Figure 822961DEST_PATH_IMAGE070
The space of possible solutions of (a) is,
Figure 774736DEST_PATH_IMAGE075
the representation dimension is
Figure 512885DEST_PATH_IMAGE076
Can ultimately be based on
Figure 626335DEST_PATH_IMAGE071
The updating results in a robust dynamic prototype, as shown in equation (12) and equation (2):
Figure 817144DEST_PATH_IMAGE077
(12),
Figure 307032DEST_PATH_IMAGE078
(2),
wherein,
Figure 899687DEST_PATH_IMAGE079
representing the operation of element-by-element multiplication between matrices.
The above process can optimize the prototype vector through multiple iterations, and simultaneously purify the distribution matrix of the support set
Figure 449617DEST_PATH_IMAGE080
In the multi-level dynamic boot module, for a video frame of a challenge set to be segmented, a pseudo label can be allocated to each pixel feature by using a self-adaptively generated dynamic prototype, and meanwhile, a huge calculation amount generated in a dense matching process is reduced by using a calculation mode of an intermediate bridge, as shown in formula (3) and formula (4):
Figure 862144DEST_PATH_IMAGE020
(3),
Figure 155722DEST_PATH_IMAGE081
(4),
wherein,
Figure 602884DEST_PATH_IMAGE022
is the temperature factor. Correspondence matrices may be used at high levels of features at low resolution
Figure 323715DEST_PATH_IMAGE082
And (4) reconstructing the characteristics of the support video frame, and inputting the reconstructed support video frame into a decoder to predict a segmentation result. For the low-level features with high resolution, a guiding method which can suppress noise by using a dynamic prototype and has less calculation amount is used for feature reconstruction. Is specifically made according to
Figure 223538DEST_PATH_IMAGE082
Selecting position indexes of similarity from the support set characteristics to obtain corresponding characteristic vectors
Figure 55228DEST_PATH_IMAGE083
Obtaining dense matching results for low-level features in an indirect guided manner
Figure 356896DEST_PATH_IMAGE084
As shown in equation (13):
Figure 248629DEST_PATH_IMAGE085
(13),
wherein,
Figure 635748DEST_PATH_IMAGE086
is to challenge the low-level features of the video frame
Figure 271128DEST_PATH_IMAGE087
To (1)jThe number of feature vectors is determined by the number of feature vectors,
Figure 692882DEST_PATH_IMAGE088
is selected to correspond to
Figure 489937DEST_PATH_IMAGE089
The product of the low-level feature vector of the support video frame and the low-level feature vector of the support video frame forms
Figure 364352DEST_PATH_IMAGE090
Representing low-level feature dense matching results
Figure 537845DEST_PATH_IMAGE091
To (1) ajA value.
The small sample video target segmentation model obtained through the training process can realize that a small amount of marked images are input as supports to segment targets of the same category in video frames.
Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a small-sample video object segmentation method of base dynamic prototype learning, in accordance with an embodiment of the present invention.
As shown in fig. 7, an electronic device 700 according to an embodiment of the present invention includes a processor 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present invention.
In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. The processor 701 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 702 and/or the RAM 703. It is noted that the programs may also be stored in one or more memories other than the ROM 702 and RAM 703. The processor 701 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.
Electronic device 700 may also include input/output (I/O) interface 705, which input/output (I/O) interface 705 also connects to bus 704, according to an embodiment of the invention. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
The present invention also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.
According to embodiments of the present invention, the computer readable storage medium may be a non-volatile computer readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to an embodiment of the present invention, a computer-readable storage medium may include the above-described ROM 702 and/or RAM 703 and/or one or more memories other than the ROM 702 and RAM 703.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. A small sample video object segmentation method based on dynamic prototype learning comprises the following steps:
acquiring a video target to be segmented;
processing the video target to be segmented by using a small sample video target segmentation model based on dynamic prototype learning to obtain a video target segmentation result, wherein the small sample video target segmentation model based on dynamic prototype learning is obtained by training according to the following method:
processing the video frame images of the challenge set and the video frame images of the support set by utilizing a part of neural network layers of a feature extraction module of the small sample video target segmentation model to obtain low-level features of the challenge video frame and low-level features of the support video frame;
all neural network layers of a feature extraction module of the small sample video target segmentation model are used for processing video frame images of the challenge set to obtain features of the challenge video frame;
carrying out mask operation on the low-level features of the support video frame to obtain foreground features of the support video frame;
processing the foreground characteristics of the support video frame and the characteristics of the challenge video frame by utilizing a mining module of the small sample video target segmentation model to obtain a corresponding relation matrix;
processing the low-level features of the support video frame, the low-level features of the challenge video frame and the corresponding relation matrix by utilizing a guide module of the small sample video target segmentation model to obtain a low-level corresponding relation matrix;
processing the corresponding relation matrix and the low-level corresponding relation matrix by utilizing a segmentation module of the small sample video target segmentation model to obtain a video target segmentation result, and optimizing the small sample video target segmentation model by utilizing a loss function of the small sample video target segmentation model;
and iterating to perform feature extraction operation, masking operation, mining operation, guiding operation, segmentation operation and optimization operation until the value of the loss function meets a preset condition to obtain a trained small sample video target segmentation model.
2. The method of claim 1, wherein the processing the supporting video frame foreground features and the challenging video frame features with the mining module of the small sample video object segmentation model to obtain a correspondence matrix comprises:
processing the foreground characteristics of the support video frame by utilizing a prototype generator of the mining module to obtain dynamic prototype characteristics;
calculating the dynamic prototype characteristics and the foreground characteristics of the support video frame to obtain a support corresponding relation matrix;
calculating the dynamic prototype characteristics and the characteristics of the challenge video frame to obtain a challenge corresponding relation matrix;
and operating the support corresponding relation matrix and the inquiry corresponding relation matrix to obtain a corresponding relation matrix.
3. The method of claim 2, wherein said processing the support video frame foreground features with a prototype generator of the mining module to obtain dynamic prototype features comprises:
carrying out global average pooling on the foreground features of the support video frames to obtain video target prototype features;
calculating the foreground characteristics of the support video frame and the prototype characteristics of the video target by using the prototype generator to obtain an attention matrix;
processing the attention matrix by using an optimal transmission algorithm to obtain an optimal distribution matrix;
and calculating the foreground characteristics of the support video frame and the optimal distribution matrix, and calculating the calculation result and the video target prototype characteristics to obtain dynamic prototype characteristics.
4. The method of claim 3, wherein the attention matrix is determined by equation (1):
Figure 991806DEST_PATH_IMAGE001
(1),
wherein,
Figure 833860DEST_PATH_IMAGE002
is the first
Figure 760227DEST_PATH_IMAGE003
The foreground feature vectors of each of the support video frames,
Figure 868209DEST_PATH_IMAGE003
an index representing the foreground feature vector of the support video frame, for a length of
Figure 752989DEST_PATH_IMAGE004
The foreground feature vector of the support video frame,
Figure 551180DEST_PATH_IMAGE003
is in the value range of
Figure 31971DEST_PATH_IMAGE005
Figure 700850DEST_PATH_IMAGE006
Is the first
Figure 490952DEST_PATH_IMAGE007
The characteristics of each prototype are characterized in that,
Figure 776439DEST_PATH_IMAGE007
an index representing the prototype feature for
Figure 60921DEST_PATH_IMAGE008
Each of said prototype features being selected from the group consisting of,
Figure 381044DEST_PATH_IMAGE007
is in the value range of
Figure 342047DEST_PATH_IMAGE009
Figure 865563DEST_PATH_IMAGE010
Is a matrix of support focus forces,
Figure 203004DEST_PATH_IMAGE011
is the support attention force matrix
Figure 112054DEST_PATH_IMAGE012
To (1) a
Figure 260270DEST_PATH_IMAGE007
Go to the first
Figure 254771DEST_PATH_IMAGE003
Column value for indicating the second
Figure 599164DEST_PATH_IMAGE007
The prototype features and
Figure 893879DEST_PATH_IMAGE003
similarity of foreground feature vectors of the support video frames;
wherein the dynamic prototype feature is determined by equation (2):
Figure 931105DEST_PATH_IMAGE013
(2),
wherein,
Figure 694793DEST_PATH_IMAGE014
is a sequence of foreground feature vectors of the support video frames,
Figure 374036DEST_PATH_IMAGE015
is the first
Figure 257679DEST_PATH_IMAGE007
Character of individual prototype
Figure 482118DEST_PATH_IMAGE006
The obtained dynamic prototype characteristics are updated to obtain the dynamic prototype characteristics,
Figure 920052DEST_PATH_IMAGE016
representing the optimized support set attention force matrix,
Figure 668566DEST_PATH_IMAGE017
is the first of the optimized support set attention force matrix
Figure 157447DEST_PATH_IMAGE007
A row vector.
5. The method of claim 1, wherein the processing the support video frame low-level features, the challenge video frame low-level features, and the correspondence matrix with a bootstrap module of the small sample video object segmentation model to obtain a low-level correspondence matrix comprises:
selecting a preset row number and a preset column number of the corresponding relation matrix to obtain an intermediate corresponding relation matrix;
calculating the low-level characteristics of the support video frame and the intermediate corresponding relation matrix to obtain a reconstructed characteristic matrix;
and operating the reconstructed feature matrix and the low-level features of the inquiry video frame to obtain the low-level corresponding relation matrix.
6. The method of claim 1, wherein the guidance module is determined by formula (3) and formula (4):
Figure 802055DEST_PATH_IMAGE018
(3),
Figure 524023DEST_PATH_IMAGE019
(4),
wherein,
Figure 279490DEST_PATH_IMAGE020
is a temperature factor, for controlling the degree of smoothing of the output probability distribution,
Figure 888457DEST_PATH_IMAGE021
the length of the modulus of the vector is represented,
Figure 703966DEST_PATH_IMAGE022
is the first
Figure 382072DEST_PATH_IMAGE023
An individual challenge video frame feature vector is generated,
Figure 941229DEST_PATH_IMAGE023
an index representing the feature vector of the challenge video frame, for height and width respectively
Figure 670282DEST_PATH_IMAGE024
And
Figure 656692DEST_PATH_IMAGE025
of the challenge video frame image of (a),
Figure 87674DEST_PATH_IMAGE023
is in the value range of
Figure 201254DEST_PATH_IMAGE026
Figure 768502DEST_PATH_IMAGE027
Is an allocation matrix of dynamic prototype features and challenge video frame features,
Figure 925814DEST_PATH_IMAGE028
a first of the allocation matrices representing the dynamic prototype features and the features of the challenge video frame
Figure 578512DEST_PATH_IMAGE007
Go to the first
Figure 230204DEST_PATH_IMAGE023
The value of the column is such that,
Figure 917538DEST_PATH_IMAGE029
an assignment matrix representing the optimized dynamic prototype features and the support video frame foreground features,
Figure 511330DEST_PATH_IMAGE030
a correspondence matrix representing the challenge video frame features and the support video frame foreground features,softmaxrepresenting a normalized exponential function.
7. The method of claim 1, wherein the loss function of the small sample video object segmentation model comprises a cross-over ratio loss function and a cross-entropy loss function;
wherein the cross entropy loss function is determined by equation (5):
Figure 402057DEST_PATH_IMAGE031
(5),
wherein,
Figure 841128DEST_PATH_IMAGE032
and
Figure 382968DEST_PATH_IMAGE033
representing the height and width of the incoming challenge video frame image or the support video frame image respectively,
Figure 147662DEST_PATH_IMAGE034
representing the product of said height and said width,
Figure 525685DEST_PATH_IMAGE035
is the result of the real segmentation,
Figure 768447DEST_PATH_IMAGE036
representing the first of the real segmentation results
Figure 430373DEST_PATH_IMAGE037
Go to the first
Figure 100388DEST_PATH_IMAGE038
The value of the column is such that,
Figure 965707DEST_PATH_IMAGE039
is the result of the segmentation predicted by the model,
Figure 12161DEST_PATH_IMAGE040
representing the result of the segmentation predicted by the model
Figure 528593DEST_PATH_IMAGE037
Go to the first
Figure 120242DEST_PATH_IMAGE038
The value of the column;
wherein the cross-over ratio loss function is determined by equation (6):
Figure 456546DEST_PATH_IMAGE041
(6),
wherein,
Figure 306690DEST_PATH_IMAGE042
representing a norm of the matrix.
8. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.
9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.
CN202210536170.6A 2022-05-18 2022-05-18 Small sample video target segmentation method based on dynamic prototype learning Active CN114638839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210536170.6A CN114638839B (en) 2022-05-18 2022-05-18 Small sample video target segmentation method based on dynamic prototype learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210536170.6A CN114638839B (en) 2022-05-18 2022-05-18 Small sample video target segmentation method based on dynamic prototype learning

Publications (2)

Publication Number Publication Date
CN114638839A true CN114638839A (en) 2022-06-17
CN114638839B CN114638839B (en) 2022-09-30

Family

ID=81953301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210536170.6A Active CN114638839B (en) 2022-05-18 2022-05-18 Small sample video target segmentation method based on dynamic prototype learning

Country Status (1)

Country Link
CN (1) CN114638839B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942463A (en) * 2019-10-30 2020-03-31 杭州电子科技大学 Video target segmentation method based on generation countermeasure network
US20200117826A1 (en) * 2018-10-16 2020-04-16 Immuta, Inc. Data access policy management
CN111210446A (en) * 2020-01-08 2020-05-29 中国科学技术大学 Video target segmentation method, device and equipment
CN111583284A (en) * 2020-04-22 2020-08-25 中国科学院大学 Small sample image semantic segmentation method based on hybrid model
CN113177549A (en) * 2021-05-11 2021-07-27 中国科学技术大学 Few-sample target detection method and system based on dynamic prototype feature fusion
CN113240039A (en) * 2021-05-31 2021-08-10 西安电子科技大学 Small sample target detection method and system based on spatial position characteristic reweighting
CN113706487A (en) * 2021-08-17 2021-11-26 西安电子科技大学 Multi-organ segmentation method based on self-supervision characteristic small sample learning
CN113763385A (en) * 2021-05-28 2021-12-07 华南理工大学 Video object segmentation method, device, equipment and medium
CN113920127A (en) * 2021-10-27 2022-01-11 华南理工大学 Single sample image segmentation method and system with independent training data set
EP3961502A1 (en) * 2020-08-31 2022-03-02 Sap Se Weakly supervised one-shot image segmentation
CN114240965A (en) * 2021-12-13 2022-03-25 江南大学 Small sample learning tumor segmentation method driven by graph attention model
CN114266977A (en) * 2021-12-27 2022-04-01 青岛澎湃海洋探索技术有限公司 Multi-AUV underwater target identification method based on super-resolution selectable network

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200117826A1 (en) * 2018-10-16 2020-04-16 Immuta, Inc. Data access policy management
CN110942463A (en) * 2019-10-30 2020-03-31 杭州电子科技大学 Video target segmentation method based on generation countermeasure network
CN111210446A (en) * 2020-01-08 2020-05-29 中国科学技术大学 Video target segmentation method, device and equipment
CN111583284A (en) * 2020-04-22 2020-08-25 中国科学院大学 Small sample image semantic segmentation method based on hybrid model
EP3961502A1 (en) * 2020-08-31 2022-03-02 Sap Se Weakly supervised one-shot image segmentation
CN113177549A (en) * 2021-05-11 2021-07-27 中国科学技术大学 Few-sample target detection method and system based on dynamic prototype feature fusion
CN113763385A (en) * 2021-05-28 2021-12-07 华南理工大学 Video object segmentation method, device, equipment and medium
CN113240039A (en) * 2021-05-31 2021-08-10 西安电子科技大学 Small sample target detection method and system based on spatial position characteristic reweighting
CN113706487A (en) * 2021-08-17 2021-11-26 西安电子科技大学 Multi-organ segmentation method based on self-supervision characteristic small sample learning
CN113920127A (en) * 2021-10-27 2022-01-11 华南理工大学 Single sample image segmentation method and system with independent training data set
CN114240965A (en) * 2021-12-13 2022-03-25 江南大学 Small sample learning tumor segmentation method driven by graph attention model
CN114266977A (en) * 2021-12-27 2022-04-01 青岛澎湃海洋探索技术有限公司 Multi-AUV underwater target identification method based on super-resolution selectable network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIAMIN WU 等: "Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition", 《ICCV 2021 OPEN ACCESS》 *
JIE LIU 等: "Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation", 《ICCV 2021 OPEN ACCESS》 *
SOOPIL KIM 等: "Uncertainty-Aware Semi-Supervised Few Shot Segmentation", 《HTTPS://ARXIV.ORG/ABS/2110.08954》 *
贾熹滨 等: "金字塔原型对齐的轻量级小样本语义分割网络", 《北京工业大学学报》 *

Also Published As

Publication number Publication date
CN114638839B (en) 2022-09-30

Similar Documents

Publication Publication Date Title
Li et al. Box-supervised instance segmentation with level set evolution
US11030750B2 (en) Multi-level convolutional LSTM model for the segmentation of MR images
CN111860504A (en) Visual multi-target tracking method and device based on deep learning
CN112801103B (en) Text direction recognition and text direction recognition model training method and device
CN114998595B (en) Weak supervision semantic segmentation method, semantic segmentation method and readable storage medium
CN114596566A (en) Text recognition method and related device
CN112990331A (en) Image processing method, electronic device, and storage medium
CN112668608B (en) Image recognition method and device, electronic equipment and storage medium
CN113128478A (en) Model training method, pedestrian analysis method, device, equipment and storage medium
CN113780326A (en) Image processing method and device, storage medium and electronic equipment
CN116982089A (en) Method and system for image semantic enhancement
CN113762327A (en) Machine learning method, machine learning system and non-transitory computer readable medium
CN110717405B (en) Face feature point positioning method, device, medium and electronic equipment
CN114170558A (en) Method, system, device, medium and article for video processing
CN113807354B (en) Image semantic segmentation method, device, equipment and storage medium
CN116980541B (en) Video editing method, device, electronic equipment and storage medium
CN117437423A (en) Weak supervision medical image segmentation method and device based on SAM collaborative learning and cross-layer feature aggregation enhancement
CN115841596B (en) Multi-label image classification method and training method and device for model thereof
CN112907750A (en) Indoor scene layout estimation method and system based on convolutional neural network
CN112861940A (en) Binocular disparity estimation method, model training method and related equipment
CN114638839B (en) Small sample video target segmentation method based on dynamic prototype learning
CN115082778B (en) Multi-branch learning-based homestead identification method and system
CN114842330B (en) Multi-scale background perception pooling weak supervision building extraction method
CN115049546A (en) Sample data processing method and device, electronic equipment and storage medium
CN113792653A (en) Method, system, equipment and storage medium for cloud detection of remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant