CN112418127A - Video sequence coding and decoding method for video pedestrian re-identification - Google Patents
Video sequence coding and decoding method for video pedestrian re-identification Download PDFInfo
- Publication number
- CN112418127A CN112418127A CN202011378786.2A CN202011378786A CN112418127A CN 112418127 A CN112418127 A CN 112418127A CN 202011378786 A CN202011378786 A CN 202011378786A CN 112418127 A CN112418127 A CN 112418127A
- Authority
- CN
- China
- Prior art keywords
- video
- feature extraction
- extraction module
- generator
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a video sequence coding and decoding method for video pedestrian re-identification, which comprises the steps of inputting label picture characteristics and video characteristics into a generator after being fused in a training stage, using a label picture as a reconstruction label, and constraining a key frame generated by the generator by image reconstruction loss; and then, sending the generated key frame into an image feature extraction module for video feature recovery, and constraining the recovered video features through feature reconstruction loss to enable the video features to be consistent with the original video features in performance. In the application stage, a K frame picture is selected by using an HSV-Top-K method to generate a key frame, and then the generated key frame is stored in equipment to reduce storage cost. When retrieval is needed, the image feature extraction module is used for recovering the video features of the generated key frames, and the recovered features retain the performance of the video features and are used for retrieval and matching of pedestrians.
Description
Technical Field
The invention belongs to the field of computer vision image retrieval, and particularly relates to a video sequence coding and decoding method for video pedestrian re-identification.
Background
Pedestrian re-identification aims at retrieving pedestrians appointed by a user from a series of monitoring videos crossing cameras; the method is widely applied to smart cities and security monitoring.
According to the number of different input pictures, the pedestrian re-identification can be divided into video pedestrian re-identification and image pedestrian re-identification. Compared with image pedestrian re-identification using a single-frame image as input, video pedestrian re-identification uses a video sequence as input and has higher robustness to environmental interference. However, the video pedestrian re-identification needs to store a large number of video sequences, which can cause huge storage overhead in practical application and increase the application cost of the video pedestrian re-identification. Meanwhile, in the application stage, the length of each video sequence is different, so that the method is not suitable for batch processing, and the calculation overhead is large.
Disclosure of Invention
The present invention is directed to a method for encoding and decoding a video sequence for pedestrian video re-identification.
The purpose of the invention is realized by the following technical scheme: a video sequence encoding and decoding method for video pedestrian re-identification, comprising the steps of:
(1) building a neural network:
(11) building a video feature extraction module:
(111) the step size of the last down-sampling of the first convolutional network is set to 1.
(112) And sequentially adding a time average pooling module, a first space average pooling module and a first batch of standardization modules behind the first convolution network.
(12) Building a generator: the generator comprises a plurality of layers of upsampling convolution layers and a convolution layer, the number of upsampling is the same as that of downsampling of the first convolution network, and the sizes of input and output characteristic graphs of the convolution layer are the same.
(13) Constructing an image feature extraction module:
(131) the step size of the last down-sampling of the second convolutional network is set to 1.
(132) And the second convolution network is sequentially connected with a second space average pooling module and a second batch of standardization modules.
(2) Taking K frames in the video sequence, taking one of the K frames as a label frame, and using the label frame for training the neural network built in the first step; the input of the video feature extraction module is K frames of a video sequence, the output of the time average pooling module is video features, and the video features are output after passing through the first space average pooling module and the first batch of standardization modules; the input of the generator is the video characteristics output by the time average pooling module and the output of the label frame in the first convolution network, and the output is a key frame; the input of the image feature extraction module is a key frame, and the output is the video feature in the key frame.
(3) Taking K frames in the video sequence to be identified, designating one frame as a label frame, inputting the K frame video sequence into the trained video feature extraction module and the generator in the step two, and storing the key frame output by the generator; and when retrieval is needed, inputting the stored key frames into the image feature extraction module trained in the step two to recover the video features in the key frames for pedestrian retrieval.
Further, the step (2) includes the sub-steps of:
(21) and randomly selecting K pictures from the video sequence and inputting the K pictures into the video feature extraction module.
(22) Selecting one frame from the K selected pictures as a label frame, fusing the video characteristics and the label frame characteristics, sending the fused video characteristics and the label frame characteristics into a generator for up-sampling, outputting the generated key frame, and using an image reconstruction loss function LirecLeading the reconstruction of key frames.
(23) And (4) sending the key frame generated in the step (22) to an image feature extraction module. In the image feature extraction module, the features before and after batch normalization are respectively fibfrAnd fiaft. By fibfrComputing a triplet loss function Litri,fiaftSending into the full connection layer to calculate the Softmax classification loss Liid。
(24) Last down sampling output by time average pooling module in video feature extraction moduleThe video characteristics of the sample layer are sent to a first space average pooling module, and the characteristics f are outputvbfr(ii) a Then sending the data into a first batch of standardized modules and outputting the characteristic fvaft. By fvbfrComputing a triplet loss function Lvtri,fvaftSending into the full connection layer to calculate the Softmax classification loss function Lvid。
(25) Normalizing the batch of step (23) to obtain a feature fiaftAnd (24) extracting the video features f after batch standardization by the video feature extraction modulevaftUsing L1And (4) carrying out characteristic reconstruction loss constraint on the loss, and recording a characteristic reconstruction loss function as Lfrec。
(26) Using the classification loss function L for both the video feature extraction module and the image feature extraction modulevidAnd a triplet loss function LvtriTraining discrimination ability and image reconstruction loss function LirecAnd a characteristic reconstruction loss function LfrecThe synchronization is performed. Finally according to the total loss function Lloss=Lvtri+Lvid+Litri+Liid+Lirec+LfrecThe entire neural network is trained.
Further, the step (22) comprises the sub-steps of:
(221) sending the randomly selected K pictures in the video sequence to a first convolution network in a video feature extraction module to obtain a video feature set of each picture,wherein the content of the first and second substances,and the video characteristics of the ith picture output at the jth downsampling layer of the first convolutional network are shown, i is 1-K, and J is the number of downsampling layers of the first convolutional network.
(222) Randomly selecting one picture L from K pictures as a label frame, wherein the label frame is characterized in that
(223) Video feature set F of K picturesiSending the video data to a time average pooling module to obtain video characteristics of all the down-sampling layers
(224) F is to beLAnd FavgAnd splicing in the channel dimension, and sending the channel dimension to a generator to generate a key frame.
(225) For the generated key frame, using the image L as a label frame and using L1Loss as a function of loss for image reconstruction LirecAnd reconstructing an image.
Further, the step (224) is specifically: the generator has J layers in total, wherein the first J-1 layer is up-sampling, and the size of the feature map of the last layer is kept unchanged. The set of all layers of the generator is represented asWhere p-1 sequentially corresponds to the 1 st to (J-1) th layers of the generator, and p-0 corresponds to the last layer of the generator. Input of each layer of the generator IpThe following were used:
wherein G isp(Ip) To the output of each layer of the generator; g0(I0) A key frame generated for the generator; []Indicating a splice in the channel dimension.
Further, the step (3) includes the sub-steps of:
(31) through an HSV-Top-K method, K pictures are selected in advance from a video sequence to be identified, then video feature extraction and key frame generation are carried out, and the video sequence is stored in equipment, and the method comprises the following substeps:
(311) and calculating HSV histogram characteristics of each picture of the video sequence, then calculating characteristic centers of the video sequence, and selecting K pictures closest to the characteristic centers to replace the whole video sequence. And optionally one of them as a label frame.
(312) Sending the picked K pictures into the trained video feature extraction module in the step two to obtain video features and label frame features; and then sent to a generator together to generate the key frame.
(313) And storing the generated key frame into the equipment.
(32) And when retrieval is needed, recovering the video features in the key frames by using the image feature extraction module trained in the step two, and using the image feature extraction module for retrieval matching of video pedestrian re-identification.
Further, in the step (311), the feature center is an average value of HSV histogram features of each picture.
Further, in the step (311), the distance refers to an L2 euclidean distance.
The invention has the beneficial effects that:
(1) the invention replaces the whole video sequence with a generated key frame embedded with video characteristics, reduces the storage cost and simultaneously retains the performance of the video characteristics.
(2) The invention fuses the label characteristics after each down sampling with the video characteristics and then sends the label characteristics and the video characteristics into the generator, thereby ensuring that the generated key frame has high imaging quality while being embedded with the video characteristics.
(3) The invention uses the image characteristic extraction network to recover the video characteristics from the key frames, and uses the characteristic reconstruction loss to restrain the performance of the recovered video characteristics to be consistent with the performance of the original video characteristics, thereby reducing the performance loss of the recovered video characteristics.
(4) In the application stage, the most representative K pictures are selected by using an HSV-Top-K method to replace the original whole video sequence for generating the key frames. Compared with the method using all the frames in the video sequence, fewer pictures are used, and the method is more suitable for batch processing, so that the calculation cost is reduced.
Drawings
FIG. 1 is a schematic diagram of the overall structure of a network during a training phase;
fig. 2 is a schematic flow diagram of the application phase.
Detailed Description
The invention relates to a video sequence coding and decoding method for video pedestrian re-identification. And then, sending the generated key frames to an image feature extraction module for video feature recovery, and constraining the recovered video features and the original video features by feature reconstruction loss. In the application stage, a K frame picture is selected by using an HSV-Top-K method to generate a key frame, and then the generated key frame is stored in equipment to reduce storage cost. When retrieval is needed, the image feature extraction module is used for recovering the video features of the generated key frames, and the recovered features retain the performance of the video features and are used for retrieval and matching of pedestrians. The method specifically comprises the following steps:
step one, building a neural network for training, specifically comprising the following steps:
(11) the method comprises the following steps of constructing a video feature extraction module, specifically:
(111) the step size of the last down-sampling of ResNet50 is set to 1.
(112) And a time average pooling module, a space average pooling module and a batch standardization module are sequentially added behind the ResNet50 to serve as video feature extraction modules.
(12) A generator composed of upsampling convolution is built and used as an encoder, and the method specifically comprises the following steps: the generator is a convolutional layer with the same size of the input characteristic diagram and the multi-layer upsampling convolution, the number of upsampling is the same as that of ResNet50 downsampling, and the input characteristic diagram and the output characteristic diagram of the convolutional layer are the same in size.
(13) The image feature extraction module is constructed and used as a decoder, and specifically comprises the following steps:
(131) the step size of the last down-sampling of ResNet50 is set to 1.
(132) The ResNet50 (which may be replaced by other convolutional networks) is followed by a spatial averaging pooling module, a batch normalization module, as an image feature extraction module.
Step two, as shown in fig. 1, training the neural network built in step one, wherein the training stage specifically comprises:
(21) and randomly selecting K pictures from the video sequence and inputting the K pictures into the video feature extraction module.
(22) Randomly selecting one frame from the K selected pictures as a tag frame, fusing video characteristics and tag frame characteristics, sending the fused video characteristics and tag frame characteristics into a generator for upsampling, outputting the generated key frame, and guiding the reconstruction of the key frame by using an image reconstruction loss function; the method specifically comprises the following steps:
(221) sending the randomly selected K pictures in the video sequence into ResNet50 in the video feature extraction module to obtain the video feature set of each picture,wherein the content of the first and second substances,shows the ith picture inJ is 1-5, the video characteristics output by the jth downsampling layer.
(222) Randomly selecting one picture L from K pictures as a label frame, wherein the label frame output by the first convolution network is characterized in that
(223) Video feature set F of K picturesiSending the video data to a time average pooling module to obtain video characteristics of all the down-sampling layers
(224) F is to beLAnd FavgAnd splicing in the channel dimension, and sending the channel dimension to a generator to generate a key frame. The generator has 5 layers, wherein the first 4 layers are up-sampling, and the size of the feature map of the last layer is kept unchanged. The set of all layers of the generator is represented asWherein p is 4-1 in sequenceThe 1 st to 4 th layers of the generator correspond to the last layer of the generator when p is 0. Input of each layer of the generator IpThe following were used:
wherein G isp(Ip) For the output of each layer of the generator, p is 0-4; []Indicating a splice in the channel dimension.
(225) For the generated key frame G0(I0) Using the image L as a label, using L1Loss as a function of loss for image reconstruction LirecAnd reconstructing an image.
(23) And (3) sending the key frame generated in the step (224) to an image feature extraction module. In the image feature extraction module, the features before and after batch normalization are respectively fibfrAnd fiaft. By fibfrComputing a triplet loss function Litri,fiaftSending into the full connection layer to calculate the Softmax classification loss Liid。
(24) The video characteristics of the down-sampling layer output by the time average pooling module in the step (223)Sent to a spatial averaging pooling module to output a characteristic fvbfr(ii) a Then sending the data into a batch standardization module to output the characteristic fvaft. By fvbfrComputing a triplet loss function Lvtri,fvaftSending into the full connection layer to calculate the Softmax classification loss function Lvid。
(25) Normalizing the batch of step (23) to obtain a feature fiaftAnd (24) extracting the video features f after batch standardization by the video feature extraction modulevaftUsing L1And (4) carrying out characteristic reconstruction loss constraint on the loss, and recording a characteristic reconstruction loss function as Lfrec。
(26) Using the classification loss function L for both the video feature extraction module and the image feature extraction modulevidAnd a triplet loss function LvtriTraining discriminant ability, and drawingImage reconstruction loss function LrecAnd a characteristic reconstruction loss function LfrecThe synchronization is performed. Final total loss function Lloss=Lvtri+Lvid+Litri+Liid+Lirec+Lfrec。
Step three, as shown in fig. 2, the application stage specifically is:
(31) through an HSV-Top-K method, K pictures are selected in advance from a video sequence to be identified, and then video feature extraction and key frame generation are carried out and stored in equipment. The method comprises the following specific steps:
(311) and calculating HSV histogram characteristics of each picture of the video sequence, then calculating characteristic centers of the video sequence, and selecting K pictures closest to the characteristic centers to replace the whole video sequence. And optionally one of them as a label frame. The feature center is the average value of HSV histogram features of each picture; the distance refers to the L2 euclidean distance.
(312) Sending the picked K pictures into the trained video feature extraction module in the step two to obtain video features and label frame features; and then sent to a generator together to generate the key frame.
(313) And storing the generated key frame into the equipment.
(32) And when retrieval is needed, restoring the video features in the key frames by using the image feature extraction module trained in the step two, and performing retrieval matching of video pedestrian re-identification by using the restored features.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (7)
1. A video sequence encoding and decoding method for video pedestrian re-identification, comprising the steps of:
(1) building a neural network:
(11) building a video feature extraction module:
(111) the step size of the last down-sampling of the first convolutional network is set to 1.
(112) And sequentially adding a time average pooling module, a first space average pooling module and a first batch of standardization modules behind the first convolution network.
(12) Building a generator: the generator comprises a plurality of layers of upsampling convolution layers and a convolution layer, the number of upsampling is the same as that of downsampling of the first convolution network, and the sizes of input and output characteristic graphs of the convolution layer are the same.
(13) Constructing an image feature extraction module:
(131) the step size of the last down-sampling of the second convolutional network is set to 1.
(132) And the second convolution network is sequentially connected with a second space average pooling module and a second batch of standardization modules.
(2) Taking K frames in the video sequence, taking one of the K frames as a label frame, and using the label frame for training the neural network built in the first step; the input of the video feature extraction module is K frames of a video sequence, the output of the time average pooling module is video features, and the video features are output after passing through the first space average pooling module and the first batch of standardization modules; the input of the generator is the video characteristics output by the time average pooling module and the output of the label frame in the first convolution network, and the output is a key frame; the input of the image feature extraction module is a key frame, and the output is the video feature in the key frame.
(3) Taking K frames in the video sequence to be identified, designating one frame as a label frame, inputting the K frame video sequence into the trained video feature extraction module and the generator in the step two, and storing the key frame output by the generator; and when retrieval is needed, inputting the stored key frames into the image feature extraction module trained in the step two to recover the video features in the key frames for pedestrian retrieval.
2. The method for encoding and decoding a video sequence for video pedestrian re-identification as claimed in claim 1, characterized in that said step (2) comprises the sub-steps of:
(21) and randomly selecting K pictures from the video sequence and inputting the K pictures into the video feature extraction module.
(22) Selecting one frame from the K selected pictures as a label frame, fusing the video characteristics and the label frame characteristics, sending the fused video characteristics and the label frame characteristics into a generator for up-sampling, outputting the generated key frame, and using an image reconstruction loss function LirecLeading the reconstruction of key frames.
(23) And (4) sending the key frame generated in the step (22) to an image feature extraction module. In the image feature extraction module, the features before and after batch normalization are respectively fibfrAnd fiaft. By fibfrComputing a triplet loss function Litri,fiaftSending into the full connection layer to calculate the Softmax classification loss Liid。
(24) The video features of the last down-sampling layer output by the time average pooling module in the video feature extraction module are sent to the first space average pooling module, and the output feature fvbfr(ii) a Then sending the data into a first batch of standardized modules and outputting the characteristic fvaft. By fυbfrComputing a triplet loss function Lvtri,fvaftSending into the full connection layer to calculate the Softmax classification loss function Lvid。
(25) Normalizing the batch of step (23) to obtain a feature fiaftAnd (24) extracting the video features f after batch standardization by the video feature extraction modulevaftUsing L1And (4) carrying out characteristic reconstruction loss constraint on the loss, and recording a characteristic reconstruction loss function as Lfrec。
(26) Using both the classification loss function Lvid and the triplet loss function L for the video feature extraction module and the image feature extraction modulevtriTraining discrimination ability and image reconstruction loss function LirecAnd a characteristic reconstruction loss function LfrecThe synchronization is performed. Finally according to the total loss function Lloss=Lvtri+Lvid+Litri+Liid+Lirec+LfrecThe entire neural network is trained.
3. A method for coding and decoding a video sequence for pedestrian video re-identification according to claim 2, characterized in that said step (22) comprises the sub-steps of:
(221) sending the randomly selected K pictures in the video sequence to a first convolution network in a video feature extraction module to obtain a video feature set of each picture,wherein the content of the first and second substances,and the video characteristics of the ith picture output at the jth downsampling layer of the first convolutional network are shown, i is 1-K, and J is the number of downsampling layers of the first convolutional network.
(222) Randomly selecting one picture L from K pictures as a label frame, wherein the label frame is characterized in that
(223) Video feature set F of K picturesiSending the video data to a time average pooling module to obtain video characteristics of all the down-sampling layers
(224) F is to beLAnd FavgAnd splicing in the channel dimension, and sending the channel dimension to a generator to generate a key frame.
(225) For the generated key frame, using the image L as a label frame and using L1Loss as a function of loss for image reconstruction LirecAnd reconstructing an image.
4. A method for encoding and decoding a video sequence for pedestrian video re-identification as claimed in claim 3, characterized in that said step (224) is specifically: the generator has J layers in total, wherein the first J-1 layer is up-sampling, and the size of the feature map of the last layer is kept unchanged. The set of all layers of the generator is represented asWherein p ═ 1 to 1 are generated in sequenceThe 1 st to (J-1) th layers of the device, p 0 corresponds to the last layer of the generator. Input of each layer of the generator IpThe following were used:
wherein G isp(Ip) To the output of each layer of the generator; g0(I0) A key frame generated for the generator; []Indicating a splice in the channel dimension.
5. The method for encoding and decoding a video sequence for video pedestrian re-identification as claimed in claim 1, characterized in that said step (3) comprises the sub-steps of:
(31) through an HSV-Top-K method, K pictures are selected in advance from a video sequence to be identified, then video feature extraction and key frame generation are carried out, and the video sequence is stored in equipment, and the method comprises the following substeps:
(311) and calculating HSV histogram characteristics of each picture of the video sequence, then calculating characteristic centers of the video sequence, and selecting K pictures closest to the characteristic centers to replace the whole video sequence. And optionally one of them as a label frame.
(312) Sending the picked K pictures into the trained video feature extraction module in the step two to obtain video features and label frame features; and then sent to a generator together to generate the key frame.
(313) And storing the generated key frame into the equipment.
(32) And when retrieval is needed, recovering the video features in the key frames by using the image feature extraction module trained in the step two, and using the image feature extraction module for retrieval matching of video pedestrian re-identification.
6. The method according to claim 5, wherein in the step (311), the feature center is an average value of HSV histogram features of each picture.
7. The method for encoding and decoding a video sequence for pedestrian video re-identification as claimed in claim 5, wherein in said step (311), said distance is L2 euclidean distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011378786.2A CN112418127B (en) | 2020-11-30 | 2020-11-30 | Video sequence coding and decoding method for video pedestrian re-identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011378786.2A CN112418127B (en) | 2020-11-30 | 2020-11-30 | Video sequence coding and decoding method for video pedestrian re-identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112418127A true CN112418127A (en) | 2021-02-26 |
CN112418127B CN112418127B (en) | 2022-05-03 |
Family
ID=74828951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011378786.2A Active CN112418127B (en) | 2020-11-30 | 2020-11-30 | Video sequence coding and decoding method for video pedestrian re-identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112418127B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033697A (en) * | 2021-04-15 | 2021-06-25 | 浙江大学 | Automatic model evaluation method and device based on batch normalization layer |
CN116563895A (en) * | 2023-07-11 | 2023-08-08 | 四川大学 | Video-based animal individual identification method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008804A (en) * | 2018-12-12 | 2019-07-12 | 浙江新再灵科技股份有限公司 | Elevator monitoring key frame based on deep learning obtains and detection method |
US20200098085A1 (en) * | 2018-09-20 | 2020-03-26 | Robert Bosch Gmbh | Monitoring apparatus for person recognition and method |
-
2020
- 2020-11-30 CN CN202011378786.2A patent/CN112418127B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200098085A1 (en) * | 2018-09-20 | 2020-03-26 | Robert Bosch Gmbh | Monitoring apparatus for person recognition and method |
CN110008804A (en) * | 2018-12-12 | 2019-07-12 | 浙江新再灵科技股份有限公司 | Elevator monitoring key frame based on deep learning obtains and detection method |
Non-Patent Citations (2)
Title |
---|
DAPENG CHEN等: "《Video Person Re-identification with Competitive Snippet-similarity Aggregation and Co-attentive Snippet Embedding》", 《 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
李梦静 等: "《视频行人重识别研究进展》", 《南京师大学报(自然科学版)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033697A (en) * | 2021-04-15 | 2021-06-25 | 浙江大学 | Automatic model evaluation method and device based on batch normalization layer |
CN113033697B (en) * | 2021-04-15 | 2022-10-04 | 浙江大学 | Automatic model evaluation method and device based on batch normalization layer |
CN116563895A (en) * | 2023-07-11 | 2023-08-08 | 四川大学 | Video-based animal individual identification method |
Also Published As
Publication number | Publication date |
---|---|
CN112418127B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062892B (en) | Single image rain removing method based on composite residual error network and deep supervision | |
CN109087258B (en) | Deep learning-based image rain removing method and device | |
CN110569814B (en) | Video category identification method, device, computer equipment and computer storage medium | |
CN112418127B (en) | Video sequence coding and decoding method for video pedestrian re-identification | |
CN111815509B (en) | Image style conversion and model training method and device | |
CN114936605A (en) | Knowledge distillation-based neural network training method, device and storage medium | |
CN115311720B (en) | Method for generating deepfake based on transducer | |
CN111429466A (en) | Space-based crowd counting and density estimation method based on multi-scale information fusion network | |
CN112818955B (en) | Image segmentation method, device, computer equipment and storage medium | |
CN113961736A (en) | Method and device for generating image by text, computer equipment and storage medium | |
CN112581409A (en) | Image defogging method based on end-to-end multiple information distillation network | |
CN112241939A (en) | Light-weight rain removing method based on multi-scale and non-local | |
CN113379858A (en) | Image compression method and device based on deep learning | |
CN116434241A (en) | Method and system for identifying text in natural scene image based on attention mechanism | |
CN114255456A (en) | Natural scene text detection method and system based on attention mechanism feature fusion and enhancement | |
CN115496919A (en) | Hybrid convolution-transformer framework based on window mask strategy and self-supervision method | |
CN114943937A (en) | Pedestrian re-identification method and device, storage medium and electronic equipment | |
CN115331083B (en) | Image rain removing method and system based on gradual dense feature fusion rain removing network | |
WO2023159765A1 (en) | Video search method and apparatus, electronic device and storage medium | |
CN112801912B (en) | Face image restoration method, system, device and storage medium | |
CN113222016B (en) | Change detection method and device based on cross enhancement of high-level and low-level features | |
CN112784838A (en) | Hamming OCR recognition method based on locality sensitive hashing network | |
CN113920317A (en) | Semantic segmentation method based on visible light image and low-resolution depth image | |
Pei et al. | UVL: A Unified Framework for Video Tampering Localization | |
Chen et al. | A video key frame extraction method based on multiview fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |