CN110490906A - A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network - Google Patents
A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network Download PDFInfo
- Publication number
- CN110490906A CN110490906A CN201910771090.7A CN201910771090A CN110490906A CN 110490906 A CN110490906 A CN 110490906A CN 201910771090 A CN201910771090 A CN 201910771090A CN 110490906 A CN110490906 A CN 110490906A
- Authority
- CN
- China
- Prior art keywords
- network
- lstm
- layer
- shot
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network, firstly for video sequence to be tracked, the input obtained every time using the continuous two field pictures in front and back as network;Then feature extraction is carried out by two continuous frames image of the twin convolutional network to input, the appearance and semantic feature of different levels, then the depth characteristic high and low layered by full articulamentum cascading is obtained after convolution operation;Depth characteristic is transmitted to the shot and long term memory network comprising two LSTM units again and carries out Series Modeling, door is forgotten by LSTM, activation screening is carried out to the target signature of different location in sequence, and export the status information of current goal by out gate;The full articulamentum of LSTM output is finally received to export target in the predicted position coordinate of present frame, and updates the region of search of next frame target.Tracking velocity is greatly improved while guaranteeing certain tracking stability and accuracy, real-time performance of tracking is enabled to be greatly improved.
Description
Technical field
The invention belongs to computer visions and visual target tracking technical field, and in particular to one kind is based on twin convolution net
The real-time vision method for tracking target of network and shot and long term memory network.
Background technique
Visual target tracking technology is one of most important Railway Project in computer vision field, and is possessed extensive
Application scenarios, such as safety monitoring, smart home, smart city etc..Its main task be in one group of video sequence, give to
Same target is found in the position of present frame from the second frame in position and state of the track target in first frame backward.Although target with
Track technology has been widely studied, but be limited to during tracking the target deformation that may occur, background is blocked, motion blur,
The disturbing factors such as illumination variation, stability, accuracy and the rapidity of visual target tracking still it is difficult to ensure that.
Current vision method for tracking target can be divided into two major classes from basic framework: be based on manual extraction feature respectively
Conventional method and deep learning method based on depth characteristic;Can inwardly two major classes be divided into from building: with stencil matching
For the production method of representative and to be detected as the duscriminant method represented frame by frame.Conventional method based on manual extraction feature is most
Big advantage is that algorithm is small and exquisite, and structure is simple, and operational efficiency is high and debugging is convenient, is easy to be transformed;However its disadvantage is also very
Obviously, i.e., lower with the most precision of the method for manual extraction traditional characteristic, feature extraction exists uncertain, it is difficult to maximize mesh
Appearance is marked for the validity of tracking process.Therefore, it is ruled by deep learning in ImageNet image recognition and detection contest
Afterwards, target tracking domain also starts to introduce the method using deep neural network as frame.The VOT contest held in 2015
In, Hyeonseob Nam et al. proposes that the MDNet that duscriminant tracking is carried out using convolution feature reaches surprising in cycle tests
94.8% precision win the title at one stroke and substantially leading other methods, thus open deep learning in target tracking domain
Research is widely applied.
Although extracting the method that target appearance feature is used to track process using convolutional neural networks can reach relatively high
Tracking accuracy, but due in one group of video sequence, when appearance of tracked target, may be difficult to expect change
Change, therefore on-line fine must be all added in existing most of methods based on depth network during tracking, that is, is tracking
When one group of sequence, different positive negative samples hundreds of to the Objective extraction of each frame, and after treatment continue frame when to extract
All samples based on, constantly update in addition to convolutional network responsible classification output full articulamentum weight parameter.It is many
It is well known, the parameter scale of deep neural network be it is very huge, any minor modifications for parameter can all make entire net
Network finds the optimal value under current state again, and since the calculation amount of this process is excessively huge compared to conventional method, institute
The time of cost is very very long, therefore the problem that the generally existing tracking velocity of tracking based on depth characteristic is slow,
It is difficult to reach real-time.
Summary of the invention
It is a kind of based on twin convolutional network the technical problem to be solved by the present invention is to overcome the deficiencies of the prior art and provide
With the real-time vision method for tracking target of shot and long term memory network, net is remembered by using the shot and long term comprising two LSTM units
Network part replaces the full articulamentum of use multilayer in conventional tracking to realize and is directed to external appearance characteristic on-line fine online-
The strategy of finetune so that sequence relation is effectively used between the distinctive frame of video data and frame, and carries out sequence and builds
Mould;While guaranteeing tracking stability and accuracy, tracking velocity is greatly improved, the real-time of tracking is improved.
The present invention provides a kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network,
Include the following steps:
Step S1, it for video sequence to be tracked, is obtained every time using the continuous two field pictures in front and back as network
Input;
Step S2, feature extraction is carried out by two continuous frames image of the twin convolutional network to input, by convolution operation
The appearance and semantic feature of different levels, then the depth characteristic high and low layered by full articulamentum cascading are obtained afterwards;
Step S3, depth characteristic is transmitted to the shot and long term memory network comprising two LSTM units and carries out Series Modeling,
Door is forgotten by LSTM, activation screening is carried out to the target signature of different location in sequence, and current goal is exported by out gate
Status information;
Step S4, the full articulamentum of LSTM output is received to export target in the predicted position coordinate of present frame, and more
The region of search of new next frame target.
As further technical solution of the present invention, in the step S2, twin convolutional network by the network number of plies, structure,
Group in parallel above and below two convolutional networks of convolution kernel size, pond mode and weight identical and shared with Padding step-length
At;The network number of plies includes the first convolutional layer, the first pond layer, the second convolutional layer, the second pond layer, third convolutional layer, Volume Four
Lamination, the 5th convolutional layer and third pond layer;The convolution kernel size and port number of first convolutional layer are 11*11*96, the second convolution
The convolution kernel size and port number of layer are 5*5*256, the convolution kernel size of third convolutional layer, Volume Four lamination and the 5th convolutional layer
It is 3*3*384 with port number, the first pond layer, the filter size of the second pond layer and third pond layer and port number are 3*
3;First convolutional layer, the first pond layer, the second pond layer and third pond layer padding mode be valid mode, volume Two
Lamination, five convolutional layer of third convolutional layer, Volume Four lamination and ground padding mode be same mode;Input picture is twin
Convolutional network is modified to having a size of 227*227*3.
Further, shot and long term memory network part includes two LSTM units, is connected entirely wherein the first LSTM receives to come from
The convolution feature input of layer is connect, the 2nd LSTM is defeated with the cascade nature of the output of the first LSTM and twin convolutional network part
Enter, and continuous and independent tracking video sequence is combined to carry out sequence data modeling, exists to the same target in same sequence
The output of corresponding each sequence state is calculated separately under different conditions.
Further, the data set of video sequence includes ILSVRC2016 video object detection data collection, Amsterdam
Normal video data library and non-natural video sequence, ILSVRC2016 video object detection data collection include 3862 video sequences
Column, 1122397 width images, the bounding box and 7911 target trajectories of 1731913 spottings;Amsterdam
Normal video data library includes 314 video sequences, and 148319 width images, each video sequence includes a specific objective;It is non-
Natural video frequency sequence is selected 478807 width static images to enhance using data by artificial synthesis from ImageNet data set
Strategy synthesis constructs.
Further, the feed-forward mode of two LSTM units is
Wherein, t is the index of frame, xtAnd yt-1Respectively the feature of current time input frame and previous moment output frame to
Amount, W, R, P are respectively the weight coefficient matrix of input gate, out gate and peephole transmitting, and b is bias vector, h be hyperbolic just
Function is cut, σ is sigmoid function, and ⊙ is dot product;Z is the whole input of LSTM unit, and i is to transmit between the cell of LSTM
Input gate, o are the out gate of each cell of LSTM, and f is to forget door, and c is the cell state of different moments in sequence, y LSTM
Overall output;Forward direction transmitting generates the output vector y for storing present frame dbjective statetWhen the current frame with processing
The cell state c of LSTMt, and ytAnd ctAll input will be taken as to be transmitted to cell when handling subsequent frame, to reach in sequence
Propagated forward in data.
Further, when video sequence to be tracked input, the target position of sequence head frame is with top-left coordinates and bottom right
The pairs of form of coordinate is given.
It is mostly based on invention replaces other in method of deep learning and receives convolution net using the full articulamentum of multilayer
The external appearance characteristic that network layers are extracted, then be used for realizing the on-line fine for improving robustness for target appearance eigentransformation
The feature of online-finetune strategy replaces multilayer complete using the shot and long term memory network part comprising two LSTM units
Articulamentum carries out Series Modeling to handle the distinctive relationship between sequences of video data, for proposed in technical background based on depth
The tracking universal velocity for spending study is slower, the not high defect of real-time, is guaranteeing certain tracking stability and accuracy
Tracking velocity is greatly improved simultaneously, real-time performance of tracking is enabled to be greatly improved.
Detailed description of the invention
Fig. 1 is method flow schematic diagram of the invention;
Specific embodiment
Referring to Fig. 1, the present embodiment provides a kind of real-time vision based on twin convolutional network and shot and long term memory network
Method for tracking target includes the following steps:
Step S1, it for video sequence to be tracked, is obtained every time using the continuous two field pictures in front and back as network
Input;
Step S2, feature extraction is carried out by two continuous frames image of the twin convolutional network to input, by convolution operation
The appearance and semantic feature of different levels, then the depth characteristic high and low layered by full articulamentum cascading are obtained afterwards;
Step S3, depth characteristic is transmitted to the shot and long term memory network comprising two LSTM units and carries out Series Modeling,
Door is forgotten by LSTM, activation screening is carried out to the target signature of different location in sequence, and current goal is exported by out gate
Status information;
Step S4, the full articulamentum of LSTM output is received to export target in the predicted position coordinate of present frame, and more
The region of search of new next frame target.
When carrying out the real-time vision target following based on twin convolutional network and shot and long term memory network, builds be used for first
Execute tracing task deep neural network, network mainly includes two parts: for input picture carry out dimensional variation and
The twin convolutional neural networks part of convolution feature extraction, and the external appearance characteristic and semantic feature that receive target different levels are simultaneously
The shot and long term memory network part of Series Modeling is carried out to the feature from same video image successive frame.
Twin convolutional network part is complete by the network number of plies, structure, convolution kernel size, pond mode and Padding step-length
Two convolutional networks of identical and shared weight compose in parallel up and down, and specific network structure and parameter are as shown in table 1:
Table 1, network structure and parameter list
Wherein, since Layer indicate to the last one layer of pond receiving the first layer convolutional layer Conv1 that original image inputs
Change the all-network layer between layer;Size indicates the convolution kernel or filter size and port number of current convolutional layer or pond layer;
The filter step size of Stride expression current network layer;Padding indicates the Padding mode that current network layer uses: Same
Mode or Valid mode.
When the two continuous frames image from same video sequence is input into network, network is first by the defeated of two images
The characteristic dimension for entering modification of dimension to 227*227*3, after each layer network convolutional calculation output in twin convolutional network part
And size, as shown in table 2:
Table 2, each network layer characteristic dimension and size
Wherein, Layer indicates network layer different in twin convolutional network part, and Size indicates input picture by corresponding
Characteristic dimension and port number after network layer handles.
Shot and long term memory network part is made of two LSTM units, and first LSTM receives the convolution from full articulamentum
Feature input, second LSTM is input with the output of first LSTM and the cascade nature of twin convolutional network part, and is tied
It closes continuous and independent tracking video sequence and carries out sequence data modeling.
After putting up the deep neural network for executing tracing task, start to train network weight parameter.It is arrived using end
The training method at end, video sequence data collection used by training include:
1) ILSVRC2016 video object detection data collection, wherein comprising 3862 video sequences, 1122397 width images,
The bounding box and 7911 target trajectories of 1731913 spottings.
2) Amsterdam Normal video data library (ALOV 300+), wherein including 314 video sequences, 148319 width
Image, each video sequence include a specific objective.
3) in order to increase the targeted species in training data, artificial synthesis and data enhancing strategy are adopted from ImageNet
Different static images is selected in data set to synthesize the non-natural video sequence that construction includes 478807 width images.
Compared with single layer LSTM, two layers of LSTM can handle more features details, capture more complicated target signature variation,
And longer sequence data is handled, it is transmitted using the feed-forward mode of such as following formula:
Wherein, t indicates the index of frame, xtAnd yt-1It is the feature of current time input frame and previous moment output frame respectively
Vector, W, R, P respectively indicate the weight coefficient matrix of input gate, out gate and peephole transmitting, and b indicates that bias vector, h are
Hyperbolic tangent function, σ are sigmoid functions, and ⊙ indicates dot product;Z indicates the whole input of LSTM unit, and i is indicated LSTM's
The input gate transmitted between cell, o indicate the out gate of each cell of LSTM, and f, which is represented, forgets door, and c indicates different moments in sequence
Cell state, y indicate LSTM overall output.The transmitting of forward direction generate the output for storing present frame dbjective state to
Measure ytWith the cell state c for handling LSTM when the current framet, and ytAnd ctAll input will be taken as to be transmitted to when handling subsequent frame
Cell, to reach the propagated forward on sequence data.Since shot and long term memory network is compared to other depth nets for tracking
Network during tracking without carrying out on-line fine, therefore real-time and tracking velocity is available significantly improves.
It is tested below.From the method proposed in recent years, chooses be based on traditional characteristic and depth characteristic, Yi Jisheng respectively
Several trackings of an accepted way of doing sth and duscriminant, respectively to biography on two target following sequential test collection of VOT2014 and VOT2016
The duscriminant correlation filter tracker DSST and the present invention of system carry out tracking accuracy and rapidity Experimental comparison, as a result such as table
Shown in 3:
Table 3, test result
Wherein, Methods is the tracking for participating in comparison;Accuracy indicates future position and the true position of target
The overlapping region set is handed over and is compared, and can embody the accuracy of tracking, Accuracy numerical value is bigger, and accuracy is higher;Speed
Indicating the average tracking rate during tracking tested entire test set within 1 second unit time, Speed numerical value is bigger,
Rapidity is higher.It should be noted that the field with "/" mark shows that the method is not delivered also in test set publication, therefore
Have neither part nor lot in the experiment of corresponding test set.
As can be seen from the table, the present invention is better than largely testing comparison other in accuracy, that is, precision, and with
Rapidity, that is, tracking velocity of track is substantially better than other and participates in experimental method, therefore effectiveness of the invention is proved.
The basic principles, main features and advantages of the invention have been shown and described above.Those skilled in the art should
Understand, the present invention do not limited by above-mentioned specific embodiment, the description in above-mentioned specific embodiment and specification be intended merely into
One step illustrates the principle of the present invention, and under the premise of not departing from spirit of that invention range, the present invention also has various change and changes
Into these changes and improvements all fall within the protetion scope of the claimed invention.The scope of protection of present invention is by claim
Book and its equivalent thereof.
Claims (6)
1. a kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network, which is characterized in that
Include the following steps:
Step S1, for video sequence to be tracked, the input obtained every time using the continuous two field pictures in front and back as network;
Step S2, feature extraction is carried out by two continuous frames image of the twin convolutional network to input, is obtained after convolution operation
Take the appearance and semantic feature of different levels, then the depth characteristic high and low layered by full articulamentum cascading;
Step S3, depth characteristic is transmitted to the shot and long term memory network comprising two LSTM units and carries out Series Modeling, by
LSTM forgets door and carries out activation screening to the target signature of different location in sequence, and the shape of current goal is exported by out gate
State information;
Step S4, the full articulamentum of LSTM output is received to export target in the predicted position coordinate of present frame, and under update
The region of search of one frame target.
2. a kind of real-time vision target based on twin convolutional network and shot and long term memory network according to claim 1 with
Track method, which is characterized in that in the step S2, twin convolutional network is by the network number of plies, structure, convolution kernel size, Chi Huafang
Two convolutional networks of formula and weight identical and shared with Padding step-length compose in parallel up and down;The network number of plies includes the
One convolutional layer, the first pond layer, the second convolutional layer, the second pond layer, third convolutional layer, Volume Four lamination, the 5th convolutional layer and
Third pond layer;The convolution kernel size and port number of first convolutional layer be 11*11*96, the convolution kernel size of the second convolutional layer and
Port number is 5*5*256, and third convolutional layer, the convolution kernel size of Volume Four lamination and the 5th convolutional layer and port number are 3*3*
384, the first pond layer, the filter size of the second pond layer and third pond layer and port number are 3*3;First convolutional layer,
The padding mode of one pond layer, the second pond layer and third pond layer is valid mode, the second convolutional layer, third convolution
Layer, five convolutional layer of Volume Four lamination and ground padding mode be same mode;Input picture by twin convolutional network modify to
Having a size of 227*227*3.
3. a kind of real-time vision target based on twin convolutional network and shot and long term memory network according to claim 1 with
Track method, which is characterized in that shot and long term memory network part includes two LSTM units, is connected entirely wherein the first LSTM receives to come from
The convolution feature input of layer is connect, the 2nd LSTM is defeated with the cascade nature of the output of the first LSTM and twin convolutional network part
Enter, and continuous and independent tracking video sequence is combined to carry out sequence data modeling, exists to the same target in same sequence
The output of corresponding each sequence state is calculated separately under different conditions.
4. a kind of real-time vision target based on twin convolutional network and shot and long term memory network according to claim 1 with
Track method, which is characterized in that the data set of video sequence includes ILSVRC2016 video object detection data collection, Amsterdam
Normal video data library and non-natural video sequence, ILSVRC2016 video object detection data collection include 3862 video sequences
Column, 1122397 width images, the bounding box and 7911 target trajectories of 1731913 spottings;Amsterdam
Normal video data library includes 314 video sequences, and 148319 width images, each video sequence includes a specific objective;It is non-
Natural video frequency sequence is selected 478807 width static images to synthesize from ImageNet data set and is constructed by artificial synthesis.
5. a kind of real-time vision target based on twin convolutional network and shot and long term memory network according to claim 1 with
Track method, which is characterized in that the feed-forward mode of two LSTM units is
Wherein, t is the index of frame, xtAnd yt-1The respectively feature vector of current time input frame and previous moment output frame, W,
R, P is respectively the weight coefficient matrix of input gate, out gate and peephole transmitting, and b is bias vector, and h is tanh letter
Number, σ are sigmoid function, and ⊙ is dot product;Z is the whole input of LSTM unit, and i is the input transmitted between the cell of LSTM
Door, o are the out gate of each cell of LSTM, and f is to forget door, and c is the cell state of different moments in sequence, and y is the whole of LSTM
Body output;Forward direction transmitting generates the output vector y for storing present frame dbjective statetWith processing LSTM when the current frame
Cell state ct, and ytAnd ctAll input will be taken as to be transmitted to cell when handling subsequent frame, to reach on sequence data
Propagated forward.
6. a kind of real-time vision target based on twin convolutional network and shot and long term memory network according to claim 1 with
Track method, which is characterized in that when video sequence to be tracked inputs, the target position of sequence head frame is with top-left coordinates and bottom right
The pairs of form of coordinate is given.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910771090.7A CN110490906A (en) | 2019-08-20 | 2019-08-20 | A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910771090.7A CN110490906A (en) | 2019-08-20 | 2019-08-20 | A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110490906A true CN110490906A (en) | 2019-11-22 |
Family
ID=68551599
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910771090.7A Pending CN110490906A (en) | 2019-08-20 | 2019-08-20 | A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110490906A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110992378A (en) * | 2019-12-03 | 2020-04-10 | 湖南大学 | Dynamic update visual tracking aerial photography method and system based on rotor flying robot |
CN111882580A (en) * | 2020-07-17 | 2020-11-03 | 元神科技(杭州)有限公司 | Video multi-target tracking method and system |
CN111899283A (en) * | 2020-07-30 | 2020-11-06 | 北京科技大学 | Video target tracking method |
CN112037263A (en) * | 2020-09-14 | 2020-12-04 | 山东大学 | Operation tool tracking system based on convolutional neural network and long-short term memory network |
CN112149616A (en) * | 2020-10-13 | 2020-12-29 | 西安电子科技大学 | Figure interaction behavior recognition method based on dynamic information |
CN112336381A (en) * | 2020-11-07 | 2021-02-09 | 吉林大学 | Echocardiogram end systole/diastole frame automatic identification method based on deep learning |
CN112465028A (en) * | 2020-11-27 | 2021-03-09 | 南京邮电大学 | Perception vision security assessment method and system |
CN112530553A (en) * | 2020-12-03 | 2021-03-19 | 中国科学院深圳先进技术研究院 | Method and device for estimating interaction force between soft tissue and tool |
CN112971769A (en) * | 2021-02-04 | 2021-06-18 | 杭州慧光健康科技有限公司 | Home personnel tumble detection system and method based on biological radar |
CN113298142A (en) * | 2021-05-24 | 2021-08-24 | 南京邮电大学 | Target tracking method based on deep space-time twin network |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326837A (en) * | 2016-08-09 | 2017-01-11 | 北京旷视科技有限公司 | Object tracking method and apparatus |
US20180129934A1 (en) * | 2016-11-07 | 2018-05-10 | Qualcomm Incorporated | Enhanced siamese trackers |
CN108229292A (en) * | 2017-07-28 | 2018-06-29 | 北京市商汤科技开发有限公司 | target identification method, device, storage medium and electronic equipment |
CN108320297A (en) * | 2018-03-09 | 2018-07-24 | 湖北工业大学 | A kind of video object method for real time tracking and system |
CN108520530A (en) * | 2018-04-12 | 2018-09-11 | 厦门大学 | Method for tracking target based on long memory network in short-term |
CN108665482A (en) * | 2018-04-18 | 2018-10-16 | 南京邮电大学 | A kind of visual target tracking method based on VGG depth networks |
CN109446889A (en) * | 2018-09-10 | 2019-03-08 | 北京飞搜科技有限公司 | Object tracking method and device based on twin matching network |
CN109711316A (en) * | 2018-12-21 | 2019-05-03 | 广东工业大学 | A kind of pedestrian recognition methods, device, equipment and storage medium again |
-
2019
- 2019-08-20 CN CN201910771090.7A patent/CN110490906A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326837A (en) * | 2016-08-09 | 2017-01-11 | 北京旷视科技有限公司 | Object tracking method and apparatus |
US20180129934A1 (en) * | 2016-11-07 | 2018-05-10 | Qualcomm Incorporated | Enhanced siamese trackers |
CN108229292A (en) * | 2017-07-28 | 2018-06-29 | 北京市商汤科技开发有限公司 | target identification method, device, storage medium and electronic equipment |
CN108320297A (en) * | 2018-03-09 | 2018-07-24 | 湖北工业大学 | A kind of video object method for real time tracking and system |
CN108520530A (en) * | 2018-04-12 | 2018-09-11 | 厦门大学 | Method for tracking target based on long memory network in short-term |
CN108665482A (en) * | 2018-04-18 | 2018-10-16 | 南京邮电大学 | A kind of visual target tracking method based on VGG depth networks |
CN109446889A (en) * | 2018-09-10 | 2019-03-08 | 北京飞搜科技有限公司 | Object tracking method and device based on twin matching network |
CN109711316A (en) * | 2018-12-21 | 2019-05-03 | 广东工业大学 | A kind of pedestrian recognition methods, device, equipment and storage medium again |
Non-Patent Citations (1)
Title |
---|
K. GREFF, ET.AL: "LSTM: A Search Space Odyssey", 《IEEE TRANSACTIONS ON NEURAL NETWORKS & LEARNING SYSTEMS》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110992378B (en) * | 2019-12-03 | 2023-05-16 | 湖南大学 | Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot |
CN110992378A (en) * | 2019-12-03 | 2020-04-10 | 湖南大学 | Dynamic update visual tracking aerial photography method and system based on rotor flying robot |
CN111882580A (en) * | 2020-07-17 | 2020-11-03 | 元神科技(杭州)有限公司 | Video multi-target tracking method and system |
CN111882580B (en) * | 2020-07-17 | 2023-10-24 | 元神科技(杭州)有限公司 | Video multi-target tracking method and system |
CN111899283A (en) * | 2020-07-30 | 2020-11-06 | 北京科技大学 | Video target tracking method |
CN111899283B (en) * | 2020-07-30 | 2023-10-17 | 北京科技大学 | Video target tracking method |
CN112037263A (en) * | 2020-09-14 | 2020-12-04 | 山东大学 | Operation tool tracking system based on convolutional neural network and long-short term memory network |
CN112037263B (en) * | 2020-09-14 | 2024-03-19 | 山东大学 | Surgical tool tracking system based on convolutional neural network and long-term and short-term memory network |
CN112149616A (en) * | 2020-10-13 | 2020-12-29 | 西安电子科技大学 | Figure interaction behavior recognition method based on dynamic information |
CN112149616B (en) * | 2020-10-13 | 2023-10-20 | 西安电子科技大学 | Character interaction behavior recognition method based on dynamic information |
CN112336381B (en) * | 2020-11-07 | 2022-04-22 | 吉林大学 | Echocardiogram end systole/diastole frame automatic identification method based on deep learning |
CN112336381A (en) * | 2020-11-07 | 2021-02-09 | 吉林大学 | Echocardiogram end systole/diastole frame automatic identification method based on deep learning |
CN112465028A (en) * | 2020-11-27 | 2021-03-09 | 南京邮电大学 | Perception vision security assessment method and system |
CN112465028B (en) * | 2020-11-27 | 2023-11-14 | 南京邮电大学 | Perception visual safety assessment method and system |
CN112530553A (en) * | 2020-12-03 | 2021-03-19 | 中国科学院深圳先进技术研究院 | Method and device for estimating interaction force between soft tissue and tool |
CN112971769A (en) * | 2021-02-04 | 2021-06-18 | 杭州慧光健康科技有限公司 | Home personnel tumble detection system and method based on biological radar |
CN113298142A (en) * | 2021-05-24 | 2021-08-24 | 南京邮电大学 | Target tracking method based on deep space-time twin network |
CN113298142B (en) * | 2021-05-24 | 2023-11-17 | 南京邮电大学 | Target tracking method based on depth space-time twin network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110490906A (en) | A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network | |
CN111553193B (en) | Visual SLAM closed-loop detection method based on lightweight deep neural network | |
CN109829436B (en) | Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network | |
CN108133188A (en) | A kind of Activity recognition method based on motion history image and convolutional neural networks | |
CN104217214B (en) | RGB D personage's Activity recognition methods based on configurable convolutional neural networks | |
CN105701460B (en) | A kind of basketball goal detection method and apparatus based on video | |
CN108520530A (en) | Method for tracking target based on long memory network in short-term | |
CN109146921A (en) | A kind of pedestrian target tracking based on deep learning | |
CN109816689A (en) | A kind of motion target tracking method that multilayer convolution feature adaptively merges | |
CN109410242A (en) | Method for tracking target, system, equipment and medium based on double-current convolutional neural networks | |
CN108171112A (en) | Vehicle identification and tracking based on convolutional neural networks | |
CN109871781A (en) | Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks | |
CN106651830A (en) | Image quality test method based on parallel convolutional neural network | |
CN108681774A (en) | Based on the human body target tracking method for generating confrontation network negative sample enhancing | |
CN108932479A (en) | A kind of human body anomaly detection method | |
CN105512680A (en) | Multi-view SAR image target recognition method based on depth neural network | |
CN109522855A (en) | In conjunction with low resolution pedestrian detection method, system and the storage medium of ResNet and SENet | |
CN110991274B (en) | Pedestrian tumbling detection method based on Gaussian mixture model and neural network | |
CN111160294B (en) | Gait recognition method based on graph convolution network | |
CN113239801B (en) | Cross-domain action recognition method based on multi-scale feature learning and multi-level domain alignment | |
CN108596327A (en) | A kind of seismic velocity spectrum artificial intelligence pick-up method based on deep learning | |
CN107729993A (en) | Utilize training sample and the 3D convolutional neural networks construction methods of compromise measurement | |
Xiao et al. | Few-shot object detection with self-adaptive attention network for remote sensing images | |
CN107025420A (en) | The method and apparatus of Human bodys' response in video | |
CN104298974A (en) | Human body behavior recognition method based on depth video sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191122 |