CN111680602A - Pedestrian re-identification method based on double-flow hierarchical feature correction and model architecture - Google Patents

Pedestrian re-identification method based on double-flow hierarchical feature correction and model architecture Download PDF

Info

Publication number
CN111680602A
CN111680602A CN202010486379.7A CN202010486379A CN111680602A CN 111680602 A CN111680602 A CN 111680602A CN 202010486379 A CN202010486379 A CN 202010486379A CN 111680602 A CN111680602 A CN 111680602A
Authority
CN
China
Prior art keywords
features
feature
appearance
level
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010486379.7A
Other languages
Chinese (zh)
Inventor
高英
林文根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010486379.7A priority Critical patent/CN111680602A/en
Publication of CN111680602A publication Critical patent/CN111680602A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

A pedestrian re-identification method based on double-current classification feature correction comprises the following steps: the method comprises the following steps: respectively inputting an RGB sequence and an optical flow sequence at the input end of the double-current special extractor, and respectively extracting appearance features and optical flow features; step two: inputting the appearance features and the optical flow features extracted in the step one into a frame-level feature corrector, and correcting information frame by frame according to the video flow to obtain frame-level correction features; step three: obtaining the feature representation of the segment-level appearance continuity and the motion mode; step four: and fusing the frame-level correction features and the segment-level correction features to obtain final video representation, and classifying the videos. A pedestrian re-identification method system based on double-current hierarchical feature correction is composed of a double-current feature extractor, an appearance section level feature corrector, an optical flow section level feature corrector, a frame level feature corrector and a channel fusion module which are connected.

Description

Pedestrian re-identification method based on double-flow hierarchical feature correction and model architecture
Technical Field
The invention relates to the technical field of image processing, in particular to a pedestrian re-identification method and a model framework based on double-current hierarchical feature correction.
Background
And (3) pedestrian re-identification: the method refers to accurately inquiring and matching the same pedestrian under a plurality of cameras. The pedestrian re-identification method mainly comprises image-based pedestrian re-identification and video-based pedestrian re-identification, and has the characteristics of continuous frames and complex motion aiming at the video-based pedestrian re-identification.
The existing video pedestrian re-identification technology has the following defects:
1. most existing video re-recognition architectures lack the consideration of multi-modal fusion. With the wide application of time sequence feature pooling and space-time attention mechanisms, most of the re-identification architectures capture important time sequence information by selecting key frames and extracting key features of the key frames. The idea can obtain good effect under a single input mode, but also limits the upper limit of the effect of rich information of the video. The video serves as a data source with rich space-time information, the content of the video can be analyzed from multiple angles, multiple data modalities are built, and the characteristics of different aspects of the video content are described, so that people in the video can be modeled more comprehensively and carefully.
2. Existing multimodal fusion architectures lack cross-modal learning and feature correction considerations. In a few technical architectures considering multi-mode fusion, a dual-flow network architecture is mostly adopted, and the two modes are used as dual-flow input for feature extraction, and the two modes mainly comprise a front fusion mode and a rear fusion mode. Pre-fusion means that before inputting a feature extraction network, two modes are fused on a channel dimension and used as an input for feature extraction; the post-fusion is to add two modes in a mode of adding corresponding elements in the process of extracting the model and output the characteristics of mode fusion. However, both of these approaches lack the information interaction between the two modalities, i.e., the effect of appearance information on capturing motion features and the effect of motion information on distinguishing appearance features.
3. Existing models based on spatiotemporal attention mechanism lack contextual connections within and between frames of video features. For the video motion process, each part of the body has a synergistic effect, the key features extracted by the same attention mechanism have time sequence context relations in different frames, and the two kinds of association relations can effectively distinguish the synergistic effect of different people and promote the distinguishing effect of the features. However, most methods based on the space-time attention mechanism cannot achieve the effect.
Disclosure of Invention
The invention provides a double-current cross-modal multi-stage pedestrian re-identification algorithm aiming at the defects of the prior art, which is used for capturing the characteristic correlation among video stream multi-dimensional information, increasing the distinguishing force and the robustness of characteristics, and is used for a pedestrian re-identification method and a model architecture based on double-current hierarchical characteristic correction for video pedestrian re-identification and pedestrian retrieval, and the specific technical scheme is as follows:
a pedestrian re-identification method based on double-current classification feature correction comprises the following steps:
the method comprises the following steps: respectively inputting an RGB sequence and an optical flow sequence at the input end of the double-current special extractor, and respectively extracting appearance features and optical flow features;
step two: inputting the appearance features and the optical flow features extracted in the step one into a frame-level feature corrector, and correcting information frame by frame according to the video flow to obtain frame-level correction features;
step three: extracting the appearance features and the optical flow features extracted in the step one and the appearance features and the optical flow features after being corrected in the step two by using an attention mechanism to extract the association relationship among all the features under the condition of considering the appearance and the motion information, capturing the association relationship among the appearance features and all the frames of the whole video segment of the motion mode, and correcting the features in a weight coefficient mode to obtain the segment-level appearance continuity and the feature characterization of the motion mode;
step four: and fusing the frame-level correction features and the segment-level correction features to obtain final video representation, and classifying the videos.
Preferably, the method comprises the following steps: the first step is specifically that the double-flow feature extractor adopts a depth convolution model after large-scale data set pre-training to extract features, for RGB images and optical flow images, initial inputs of the RGB images and the optical flow images have different dimensions, the RGB image inputs have three dimensions, the optical flow is two dimensions, and the dimensions of the RGB images and the optical flow are aligned.
A pedestrian re-identification method system based on double-current hierarchical feature correction is provided with a double-current feature extractor, wherein the double-current feature extractor consists of an appearance feature extractor and a motion feature extractor, and the input ends of the appearance feature extractor and the motion feature extractor are respectively connected with an RGB sequence and an optical flow sequence input port;
the first output end of the appearance feature extractor is connected with the appearance section-level feature corrector, and the first output end of the motion feature extractor is connected with the optical flow section-level feature corrector; second output ends of the appearance characteristic extractor and the motion characteristic extractor are respectively connected with the input end of the frame level characteristic corrector; the RGB characteristic output end of the frame level characteristic corrector is respectively connected with the appearance segment level characteristic corrector and the channel fusion module; the light stream characteristic output end of the frame-level characteristic corrector is respectively connected with the light stream section-level characteristic corrector and the channel fusion module; and the output ends of the appearance section-level feature corrector and the optical flow section-level feature corrector are respectively connected with the channel fusion module.
The invention has the beneficial effects that: 1. the double-flow model is utilized to process two modes of RGB sequence input and optical flow sequence input, appearance characteristics and motion characteristics of a motion figure are considered at the same time, and the multi-mode information fusion can enable the robustness of the model to be stronger. 2. The feature correction is a further supplement of feature learning, and stronger feature characterization can be obtained on the basis of the features extracted by the basic depth model. 3. The hierarchical feature processing mode can extract feature relations in different time sequence lengths, the feature relations between front frames and rear frames can be obtained through frame-by-frame information transmission of the same mode, representative characterization information is reserved, redundant noise is removed, the relation between corresponding frames of the modes can be obtained through cross-mode frame-by-frame information interactive learning, the outstanding characterization of appearance information on a motion mode is measured, the prominent key observation position of the motion information on the whole appearance is also obtained, the information interval among the modes is broken through by the cooperative learning mode, and real cross-mode learning is achieved. 4. A hierarchical feature processing mode plays an important role in obtaining the relationship between long sequence frames, the traditional long sequence feature extraction mostly adopts a mode of selecting important frames and extracting important information of a single frame to construct a long sequence time sequence representation, but ignores the space-time association relationship of features between multiple continuous frames, for example, an action occurs when each part of a body collaborates on the time span of multiple frames, different key features of different frames possibly have an inter-association relationship, and the long sequence feature representation with more discriminative power can be obtained by correcting on the basis of the original features through the feature association relationship learning of the long sequence. 5. By using the attention mechanism, important feature positions and association relations can be extracted, and a good feature alignment effect can be achieved.
Drawings
Fig. 1 is a schematic diagram of the framework of the present invention.
Fig. 2 is a schematic diagram of the dual-stream input initial network layer processing in the present invention.
FIG. 3 is a schematic diagram of the cross-mode frame-level feature modifier based on LSTM design in the present invention.
Fig. 4 is a schematic structural diagram of the intermediate stage feature corrector in the present invention.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
As shown in fig. 1, 2, 3 and 4: a pedestrian re-identification method based on double-current classification feature correction comprises the following steps:
the method comprises the following steps: respectively inputting an RGB sequence and an optical flow sequence at the input end of the double-current special extractor, and respectively extracting appearance features and optical flow features;
specifically, the basic parallel double-current feature extractor is used for respectively extracting features of an RGB image sequence and an optical flow sequence through two parallel basic networks to realize the representation of the features of the basic image of each frame, and the basic feature extractor is used for extracting the features by adopting a deep convolution model pre-trained in a large-scale data set, such as ImageNet and the like. For an RGB image and an optical flow image, initial inputs of the RGB image and the optical flow image have different scales, the RGB input has three dimensions, and the optical flow has two dimensions, and the present embodiment performs processing in the manner of fig. 2, so that the dimensions of the RGB image and the optical flow image are aligned. The basic feature extraction network of the RGB mode is ImageNet pre-trained ResNet50, the basic feature extraction network of the optical flow sequence is ImageNet pre-trained ResNet50 with an input convolution layer modified, for the RGB image sequence, the most initial network layer input of the original network is kept unchanged, for the optical flow image sequence, the convolution layer with one input dimension being two output dimensions and the same dimension is adopted to replace the initial convolution layer of the extraction network, and the unification of double-current input dimensions is achieved.
Step two: inputting the appearance features and the optical flow features extracted in the step one into a frame-level feature corrector, and correcting information frame by frame according to the video flow to obtain frame-level correction features;
specifically, at this stage, we input the two streams into the features obtained by the feature extractor in the previous stage to modify the information frame by frame according to the video stream. The characteristics of each frame are modified under the characteristics of the previous modified frame and the corresponding frame of the other mode, and the modified characteristics of the bimodal information and the continuous frame context information are fused. Notably, for an initial frame, we take the frame itself as its context continuation frame since there is no previous frame. The frame-level feature modifier of fig. 2 is an RNN-type tandem structure, which is a design structure improved on the basis of LSTM, the detailed structure of each time stamp unit is shown in fig. 3, and it is assumed that the RGB feature and the optical flow feature X of the current time stampt,FtIntermediate cellular state CtThe former cell state is Ct-1Cell state for context information memory and transfer, hidden state HtIs the correction information per timestamp, is the information optimized under supervision of the context information and cross-modal learning, Ht-1The state is a hidden state of the last timestamp, the memory and the transmission of the time sequence information are realized, and the sigma is a Sigmoid function. The unit of the frame level modifier can also be divided into three gates, a forgetting gate, an input gate and an output gate. Taking RGB mode as an example, the information of the last cell state is retained and discarded at the forgetting gate, and the learning parameter of the forgetting gate is WfThe input is the previous hidden state and the characteristics of the current RGB frame,the weight obtained by forgetting the gate is
ft=σ(Wf*[Ht-1,Xt]+bf)
The forgetting gate is consistent with the LSTM, and the input gate and the output gate are different from the LSTM in that the feature of the current frame of another modality is added, and for the RGB model, the feature corresponding to the optical flow modality is added. The input gate is based on the previous hidden state Ht-1Current two modal characteristics Xt,FtLearn which information to update, using the last hidden state Ht-1And feature X of the current modality frametObtaining new candidate cell states
Figure RE-GDA0002591841330000051
Representing updated candidate information.
The calculation of the two steps is
it=σ(Wi*[Ht-1,Xt,Ft]+bi)
Figure RE-GDA0002591841330000061
By forgetting the gate to select the last cell state plus entering the gate's update information, a new cell state C can be obtained that contains context information and cross-modal learningt
Figure RE-GDA0002591841330000062
The output gate thereafter also differs from the output gate of LSTM in that it is required to have a previous hidden state H according to the previous hidden statet-1Current two modal characteristics Xt,FtDetermining final information of output cells, i.e. generating determination weights for output information of current state
ot=σ(Wo*[Ht-1,Xt,Ft]+b0)
The final hidden state of the current timestamp is output as
ht=ot*tanh(Ct)
The frame-level cross-modal feature corrector adds multi-modal supervision on the basis of the LSTM, so that the frame-level cross-modal feature corrector has double information optimization effects of context information perception and multi-modal interactive learning, and realizes frame-by-frame information correction.
Step three: extracting the appearance features and the optical flow features extracted in the step one and the appearance features and the optical flow features after being corrected in the step two by using an attention mechanism to extract the association relationship among all the features under the condition of considering the appearance and the motion information, capturing the association relationship among the appearance features and all the frames of the whole video segment of the motion mode, and correcting the features in a weight coefficient mode to obtain the segment-level appearance continuity and the feature characterization of the motion mode;
specifically, at this stage, features extracted by an original double-current image sequence in a first-stage parallel feature extractor and features modified by a second-stage frame-level are input, the features obtained at the first stage are regarded as original features representing appearance and motion, the features obtained at the second stage are regarded as features considering appearance continuity and motion correlation of each frame, the third stage is characterized by using attention mechanism to extract the association relations between all the features in the segment under the condition of considering appearance and motion information through feature supervision at the second stage, capturing the association relations between the appearance features and all the frames of the whole video segment in the motion mode, and modifying the features in a weight coefficient mode to obtain segment-level appearance continuity and feature representation of the motion mode. The segment-level feature corrector in fig. 2 is independent as shown in fig. 4, which is a double-current spatial attention structure improved according to an attention mechanism, the frame-level correction features are used as input, two convolutional neural networks are used to generate weight Mask matrixes corresponding to the modal segment level, the weight Mask matrixes are respectively an RGB Mask and an optical flow Mask, the basic features are multiplied by masks after passing through the convolutional neural networks, and the segment-level optimization features of the modal can be obtained.
Step four: the frame-level correction features and the segment-level correction features are fused to obtain final video representations, video classification is carried out, the frame-level correction features and the segment-level correction features are fused along a channel to obtain the final video representations, pedestrian re-identification classification is carried out through the full-connection layer, and good performance can be achieved.
A pedestrian re-identification method system based on double-current hierarchical feature correction is provided with a double-current feature extractor, wherein the double-current feature extractor consists of an appearance feature extractor and a motion feature extractor, and the input ends of the appearance feature extractor and the motion feature extractor are respectively connected with an RGB sequence and an optical flow sequence input port;
the first output end of the appearance feature extractor is connected with the appearance section-level feature corrector, and the first output end of the motion feature extractor is connected with the optical flow section-level feature corrector; second output ends of the appearance characteristic extractor and the motion characteristic extractor are respectively connected with the input end of the frame level characteristic corrector; the RGB characteristic output end of the frame level characteristic corrector is respectively connected with the appearance segment level characteristic corrector and the channel fusion module; the light stream characteristic output end of the frame-level characteristic corrector is respectively connected with the light stream section-level characteristic corrector and the channel fusion module; and the output ends of the appearance section-level feature corrector and the optical flow section-level feature corrector are respectively connected with the channel fusion module.

Claims (3)

1. A pedestrian re-identification method based on double-current classification feature correction is characterized by comprising the following steps:
the method comprises the following steps: respectively inputting an RGB sequence and an optical flow sequence at the input end of the double-current special extractor, and respectively extracting appearance features and optical flow features;
step two: inputting the appearance features and the optical flow features extracted in the step one into a frame-level feature corrector, and correcting information frame by frame according to the video flow to obtain frame-level correction features;
step three: extracting the appearance features and the optical flow features extracted in the step one and the appearance features and the optical flow features after being corrected in the step two by using an attention mechanism to extract the association relationship among all the features under the condition of considering the appearance and the motion information, capturing the association relationship among the appearance features and all the frames of the whole video segment of the motion mode, and correcting the features in a weight coefficient mode to obtain the segment-level appearance continuity and the feature characterization of the motion mode;
step four: and fusing the frame-level correction features and the segment-level correction features to obtain final video representation, and classifying the videos.
2. The pedestrian re-identification method based on the double-flow classification feature correction according to claim 1, characterized in that: the first step is specifically that the double-flow feature extractor adopts a depth convolution model after large-scale data set pre-training to extract features, for RGB images and optical flow images, initial inputs of the RGB images and the optical flow images have different dimensions, the RGB image inputs have three dimensions, the optical flow is two dimensions, and the dimensions of the RGB images and the optical flow are aligned.
3. A pedestrian re-identification method system based on double-current classification feature correction is characterized in that: the method comprises the following steps of arranging a double-flow feature extractor, wherein the double-flow feature extractor consists of an appearance feature extractor and a motion feature extractor, wherein the input ends of the appearance feature extractor and the motion feature extractor are respectively connected with an RGB sequence and an optical flow sequence input port;
the first output end of the appearance feature extractor is connected with the appearance section-level feature corrector, and the first output end of the motion feature extractor is connected with the optical flow section-level feature corrector;
second output ends of the appearance characteristic extractor and the motion characteristic extractor are respectively connected with the input end of the frame level characteristic corrector;
the RGB characteristic output end of the frame level characteristic corrector is respectively connected with the appearance segment level characteristic corrector and the channel fusion module;
the light stream characteristic output end of the frame-level characteristic corrector is respectively connected with the light stream section-level characteristic corrector and the channel fusion module;
and the output ends of the appearance section-level feature corrector and the optical flow section-level feature corrector are respectively connected with the channel fusion module.
CN202010486379.7A 2020-06-01 2020-06-01 Pedestrian re-identification method based on double-flow hierarchical feature correction and model architecture Pending CN111680602A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010486379.7A CN111680602A (en) 2020-06-01 2020-06-01 Pedestrian re-identification method based on double-flow hierarchical feature correction and model architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010486379.7A CN111680602A (en) 2020-06-01 2020-06-01 Pedestrian re-identification method based on double-flow hierarchical feature correction and model architecture

Publications (1)

Publication Number Publication Date
CN111680602A true CN111680602A (en) 2020-09-18

Family

ID=72453439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010486379.7A Pending CN111680602A (en) 2020-06-01 2020-06-01 Pedestrian re-identification method based on double-flow hierarchical feature correction and model architecture

Country Status (1)

Country Link
CN (1) CN111680602A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998995A (en) * 2022-06-13 2022-09-02 西安电子科技大学 Cross-view-angle gait recognition method based on metric learning and space-time double-flow network
CN117612711A (en) * 2024-01-22 2024-02-27 神州医疗科技股份有限公司 Multi-mode prediction model construction method and system for analyzing liver cancer recurrence data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740419A (en) * 2018-11-22 2019-05-10 东南大学 A kind of video behavior recognition methods based on Attention-LSTM network
CN109961034A (en) * 2019-03-18 2019-07-02 西安电子科技大学 Video object detection method based on convolution gating cycle neural unit
CN110135386A (en) * 2019-05-24 2019-08-16 长沙学院 A kind of human motion recognition method and system based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740419A (en) * 2018-11-22 2019-05-10 东南大学 A kind of video behavior recognition methods based on Attention-LSTM network
CN109961034A (en) * 2019-03-18 2019-07-02 西安电子科技大学 Video object detection method based on convolution gating cycle neural unit
CN110135386A (en) * 2019-05-24 2019-08-16 长沙学院 A kind of human motion recognition method and system based on deep learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998995A (en) * 2022-06-13 2022-09-02 西安电子科技大学 Cross-view-angle gait recognition method based on metric learning and space-time double-flow network
CN117612711A (en) * 2024-01-22 2024-02-27 神州医疗科技股份有限公司 Multi-mode prediction model construction method and system for analyzing liver cancer recurrence data
CN117612711B (en) * 2024-01-22 2024-05-03 神州医疗科技股份有限公司 Multi-mode prediction model construction method and system for analyzing liver cancer recurrence data

Similar Documents

Publication Publication Date Title
CN111340814B (en) RGB-D image semantic segmentation method based on multi-mode self-adaptive convolution
CN113673307A (en) Light-weight video motion recognition method
CN109977893B (en) Deep multitask pedestrian re-identification method based on hierarchical saliency channel learning
CN109948721B (en) Video scene classification method based on video description
CN110580472B (en) Video foreground detection method based on full convolution network and conditional countermeasure network
CN113140020B (en) Method for generating image based on text of countermeasure network generated by accompanying supervision
CN113240683B (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN111339908A (en) Group behavior identification method based on multi-mode information fusion and decision optimization
CN111680602A (en) Pedestrian re-identification method based on double-flow hierarchical feature correction and model architecture
CN111768354A (en) Face image restoration system based on multi-scale face part feature dictionary
CN111401247A (en) Portrait segmentation method based on cascade convolution neural network
CN115719510A (en) Group behavior recognition method based on multi-mode fusion and implicit interactive relation learning
CN111241963A (en) First-person visual angle video interactive behavior identification method based on interactive modeling
CN112241939A (en) Light-weight rain removing method based on multi-scale and non-local
CN115330620A (en) Image defogging method based on cyclic generation countermeasure network
CN114821050A (en) Named image segmentation method based on transformer
Liu et al. Audio-visual speech recognition using a two-step feature fusion strategy
CN113052017B (en) Unsupervised pedestrian re-identification method based on multi-granularity feature representation and domain self-adaptive learning
CN113033283A (en) Improved video classification system
Prakash et al. Recent advancements in automatic sign language recognition (SLR)
CN113609923B (en) Attention-based continuous sign language sentence recognition method
CN113159007B (en) Gait emotion recognition method based on adaptive graph convolution
CN113269068B (en) Gesture recognition method based on multi-modal feature adjustment and embedded representation enhancement
CN115346261A (en) Audio-visual emotion classification method based on improved ConvMixer network and dynamic focus loss
CN115294353A (en) Crowd scene image subtitle description method based on multi-layer attribute guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200918