CN112487920B - Convolution neural network-based crossing behavior identification method - Google Patents

Convolution neural network-based crossing behavior identification method Download PDF

Info

Publication number
CN112487920B
CN112487920B CN202011338744.6A CN202011338744A CN112487920B CN 112487920 B CN112487920 B CN 112487920B CN 202011338744 A CN202011338744 A CN 202011338744A CN 112487920 B CN112487920 B CN 112487920B
Authority
CN
China
Prior art keywords
frame
target
network
video
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011338744.6A
Other languages
Chinese (zh)
Other versions
CN112487920A (en
Inventor
詹瑾瑜
周巧瑜
江维
范翥峰
周星志
孙若旭
温翔宇
宋子微
廖炘可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202011338744.6A priority Critical patent/CN112487920B/en
Publication of CN112487920A publication Critical patent/CN112487920A/en
Application granted granted Critical
Publication of CN112487920B publication Critical patent/CN112487920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a climbing behavior identification method based on a convolutional neural network, which is applied to the field of target identification and aims at solving the problem of low detection precision in the behavior identification of pedestrian climbing over a railing in the prior art; by drawing the bounding box with the same size as the figure, the invention overcomes the defects of low real-time performance and no enlargement or reduction of the bounding box in the traditional target detection method; predicting image feature types by adopting a Yolo target detection network, and tracking a target by adopting a GOTURN network; and finally, rapidly using the relative position relation between the railing and the track point set by a priori knowledge method to judge whether the track point set is a crossing behavior, and if the track point set is the crossing behavior, outputting a crossing label and initiating warning.

Description

Convolution neural network-based crossing behavior identification method
Technical Field
The invention belongs to the field of target detection, and particularly relates to a behavior recognition technology.
Background
Aiming at a plurality of types of target scenes, the target detection method aims at accurately judging the type and the position of a target in an image, and the two-stage method can solve the problems. Researchers mainly generate a candidate frame by a Region Proposal method and then carry out coordinate regression prediction according to the candidate frame. Ross Girshick et al adopts a CNN network to extract image features, improves the representation capability of the features to samples from experience-driven artificial feature normal forms HOG and SIFT to data-driven representation learning models, solves the problems that small samples are difficult to train or even over-fit and the like by adopting a mode of supervised pre-training and fine-tuning of the small samples under large samples, and improves the accuracy of target detection to a certain extent. Ross Girshick et al proposed a Fast convolutional network method (Fast R-CNN) based on regional recommendations for target detection. Fast R-CNN uses deep convolutional networks on the basis of previous work, and can classify objects more efficiently. Compared with the previous work, Fast R-CNN carries out multiple innovations, improves the detection precision and the training and testing speed.
Generally, the two-stage method has high network complexity and low processing speed, so that the real-time performance is not high, people in a monitoring video cannot be predicted in real time, and the one-stage method can perfectly solve the problem of low real-time performance. In the one-stage approach, researchers generate coordinate regression predictions, primarily by performing regression directly. Joseph Redmon et al propose a novel target detection method (Yolo). The core idea of the method is to directly regress the position of the bounding box and the category of the bounding box at an output layer by using the whole graph as the input of the network. The velocity of the Yolo method is much faster than the two-stage method, and the basic Yolo model processes images in real time at a rate of 45 frames per second. Wei Liu et al propose a method (SSD) for detecting objects in an image using a single deep neural network. The method combines the regression idea of Yolo and the anchor box mechanism of fast R-CNN, compared with the method proposed by the predecessor, the improved point of SSD is divided into two points, firstly, feature maps with different scales are extracted for detection, a large-scale feature map (the feature map closer to the front) can be used for detecting small objects, and a small-scale feature map (the feature map closer to the rear) is used for detecting large objects; the second is that the SSD employs a priori boxes of different dimensions and aspect ratios. Moreover, the SSD realizes end-to-end training and has higher precision. For different one-stage convolutional neural network methods, different researchers start from the improvement rate to improve the network model, and meanwhile, the accuracy of target detection is improved under the condition of guaranteeing the real-time performance.
The visual target tracking refers to detecting, extracting, identifying and tracking a moving target in an image sequence to obtain motion parameters of the moving target, such as position, speed, acceleration, motion track and the like, so that the next step of processing and analysis is performed, the behavior understanding of the moving target is realized, and a higher-level detection task is completed.
Researchers in the field of target tracking divide the tracking algorithm into generator and discriminant methods. The generating method adopts a characteristic model to describe the appearance characteristic of the target, and then minimizes the reconstruction error between the tracking target and the candidate target to confirm the target; the generating method focuses on feature extraction of the target, ignores background information of the target, and is prone to target drift or target loss when the target appearance changes violently or is shielded. The discriminant method regards target tracking as a binary classification problem, and determines a target from candidate targets by training classifiers related to the target and the background. Most of the current target tracking algorithms based on deep learning also belong to discriminant methods.
Under such a background, a target detection method is used for drawing a figure boundary frame, and then a target tracking method is used for positioning the figures frame by frame, so that the process of primarily judging the motion trajectory of the figures becomes a mainstream trend. Usually, the crossing behavior usually occurs on a road, pedestrians cross the road to cross a railing, or the pedestrians cross the railing around a cell or a school fence to enter and exit, whether the pedestrians cross the railing can be predicted more quickly through a target detection and target tracking method in deep learning, meanwhile, warning is initiated in time, the safety of the pedestrians is guaranteed, a traffic system is standardized, and a community system is perfected.
Disclosure of Invention
In order to solve the technical problem, the invention discloses a crossing behavior identification method based on a convolutional neural network.
The technical scheme adopted by the invention is as follows: a rollover behavior identification method based on a convolutional neural network comprises the following steps:
s1, processing the video data: screening and cutting, namely screening out videos with behaviors of crossing the railing and other behaviors near the railing, and cutting the videos into pictures of video frames;
s2, detecting the video frame obtained in the step S1 through a Yolo target detection network; specifically, the method comprises the following steps: the Yolo target detection network comprises at least three parts: a Backbone section, a tack section, a Head section; aggregating and forming image features through a backhaul part, combining and transmitting the image features to a prediction layer through a Neck part, predicting the image features through a Head part, generating a boundary frame and predicting categories;
s3, transferring the boundary frame, the prediction type person and the current video frame to a GOTURN network for target tracking, inputting the coordinates of the boundary frame from the current frame to the GOTURN network, inputting the target from the previous frame to the GOTURN network, and the GOTURN network learning and comparing the targets to find the target object in the current image, drawing track points frame by frame and connecting the track points into a track line;
and S4, judging whether the behavior is a behavior of crossing the railing by using the priori knowledge through the relative position relation between the track point set and the railing position.
Further, the step S1 includes the following sub-steps:
s11, searching and downloading a plurality of video data sets containing various character actions;
s12, screening character videos containing behaviors of crossing the rail and other object videos similar to the character shape from the video data set;
and S13, cutting the screened videos, and continuously cutting each video into video frames according to 25fps to obtain a series of continuous video frames and storing the video frames.
Further, the partially polymerizing and forming the image feature by the Backbone is specifically as follows: the input video frames are aggregated through a CSPRESNext50 neural network and form image features to realize image feature extraction.
Further, the grouping and combining by the tack part and transferring the image features to the prediction layer specifically includes: and combining the image characteristics through SPP-block and PANET and transmitting the image characteristics to a prediction layer.
Further, the prediction category includes at least a prediction category person.
Further, the step S3 includes the following sub-steps:
s31, the boundary box generated in the step S24 and the prediction type are person, and the current video frame is transmitted to a GOTURN network;
s32, cutting the current video frame and the boundary box to obtain a central area with a target, and cutting the previous frame of video to obtain a search area with the target;
s33, enabling the search area of the previous frame target and the current frame obtained in the S32 to pass through the CNN convolutional layer at the same time, enabling the output of the convolutional layer to pass through the full connection layer to be used for regressing the position of the boundary frame of the current frame target, and drawing the center point of the coordinate frame of the current frame as a track point for subsequent track analysis;
and S34, repeating the steps S31-S32 until all the video frames enter the GOTURN network.
Further, the step S4 includes the following sub-steps:
s41, recording the track points generated in the step S33, and generating a track point set;
s42, manually marking the position coordinates of the railing line for subsequent track analysis;
and S43, judging whether the track is a crossing behavior or not by using the prior knowledge according to the relative position relation between the track point set and the railing position, if so, outputting a cross label, and if not, not outputting the cross label.
The invention has the beneficial effects that: the one-stage target detection method can detect the figure in real time and accurately draw the boundary box with the same size as the figure, and overcomes the defects of low real-time performance and unavailable size of the boundary box in the traditional target detection method. Meanwhile, the discriminant target tracking method is simple in network structure, high in figure tracking speed and accurate, track points of subsequent video frames can be effectively drawn, and finally whether the crossing behavior is achieved or not is judged by quickly applying the relative position relation between the railing and the track point set through a priori knowledge method.
Drawings
FIG. 1 is a flow diagram of a convolutional neural network-based traversal behavior identification technique of the present invention;
FIG. 2 is an overall design diagram of the convolutional neural network-based crossing behavior recognition technique of the present invention;
FIG. 3 is a diagram of a one-stage network architecture according to the present invention;
fig. 4 is a diagram of the architecture of the GOTURN network according to the present invention.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
As shown in fig. 1, the method for identifying a crossing behavior based on a convolutional neural network of the present invention includes the following steps:
s1, processing the video data: screening and cutting, namely screening out videos with crossing behaviors and part of other behaviors, and cutting the videos into pictures of video frames; as shown in fig. 2, the method specifically includes the following sub-steps:
and S11, searching and downloading a plurality of video data sets containing various character actions for subsequent work. The downloaded video data set is also limited due to the large amount of memory required for the video data set.
And S12, screening out the character video containing the turning behavior and other object videos similar to the character form from the video data set. According to the action classification in the video data set, videos with characters having crossing behaviors, videos with characters having other behaviors and the like can be screened out.
And S13, cutting the screened videos, and continuously cutting each video into video frames according to 25fps to obtain a series of continuous video frames and storing the video frames.
In general, video cropping (or segmentation) methods are mainly classified into a time-domain-based video object segmentation method, a motion-based video object segmentation method, and an interactive video object segmentation method. The time domain segmentation mainly utilizes the continuity and the correlation between adjacent video images for segmentation, one specific method is to obtain a differential image by subtracting a current frame and a background frame, and the other method is to obtain the differential image by utilizing the difference between two frames or between multiple frames; the video object segmentation based on motion is mainly based on methods such as an optical flow field and the like to carry out motion parameter estimation, a pixel area which accords with a motion model is solved, and then the area is combined to form a motion object to carry out video segmentation; whereas in interactive segmentation, the user initially segments the video image through a graphical user interface and then segments subsequent frames with motion and spatial based information.
S2, passing the video frame through a Yolo target detection network: aggregating and forming image features through a backhaul, combining and transmitting the image features to a prediction layer through a Neck, predicting the image features through a Head, generating a boundary frame and predicting categories; as shown in fig. 3, this step specifically includes the following sub-steps:
s21, inputting the processed video frame;
s22, Backbone section. Gathering input video frames through a CSPRESNext50 neural network and forming image features so as to realize image feature extraction;
s23, part Neck. Combining the image characteristics through SPP-block and PANET and transmitting the image characteristics to a prediction layer;
s24, Head section. The image features of the prediction layer are predicted by Head, a bounding box is generated, a prediction type is predicted, and if the prediction type is person, the process proceeds to step S3, and if the prediction type is not person, the process proceeds to step S21.
The Yolo detection network comprises 24 convolutional layers and 2 full-connection layers, and the Yolo network uses the GoogLeNet classification network structure for reference. In contrast, Yolo does not use an initiation module, but instead uses a 1 × 1 convolutional layer (where the presence of the 1 × 1 convolutional layer is for cross-channel information integration) + a simple replacement for the 3 × 3 convolutional layer. The Yolo fully-connected output layer divides the input image into S × S grids, each grid being responsible for detecting objects 'falling into' the grid, S representing the number of cells, e.g., when S ═ 7, S × S represents the division of the image into 7 × 7 cells, 7 cells in the horizontal direction, and 7 cells in the vertical direction. If the coordinates of the center position of an object fall into a certain grid, the grid is responsible for detecting the object. Each trellis outputs B bounding box information, and C probability information that the object belongs to a certain class. The bounding box information contains 5 data values, x, y, w, h, and confidence. Where x and y refer to coordinates of the center position of the bounding box of the object predicted by the current grid. w, h are the width and height of the bounding box. Note that: in the actual training process, the values of w and h are normalized to a [0,1] interval by using the width and the height of the image; x, y are offset values of the bounding box center position relative to the current grid position and are normalized to [0,1 ]. The confidence reflects whether the current bounding box contains the object and the accuracy of the position of the object, and the calculation method is as follows:
confidence=P(object)*IOU
if the bounding box contains an object, p (object) is 1; otherwise, p (object) is 0.IOU (intersection over intersection) is the prediction bounding box.
Yolo optimizes model parameters using mean square sum error as a loss function, i.e., Yolo detects the mean square sum error of S x S (B x 5+ C) dimensional vector output by the network and the corresponding S x S (B x 5+ C) dimensional vector of the real image:
Figure BDA0002797974740000051
coordError, iouError, classror represent coordinate error, IOU error, and classification error between the predicted data and the calibration data, respectively.
S3, transferring the boundary frame and the prediction type person as well as the current video frame to a GOTURN network for target tracking, inputting the coordinates of the boundary frame from the current frame to the network, inputting the target from the previous frame to the network, and comparing the targets through network learning to find the target object in the current image, drawing track points frame by frame and connecting the track points into a track line; as shown in fig. 4, the method specifically includes the following sub-steps:
s31, the boundary box generated in the step S24 and the prediction type are person, and the current video frame is transmitted to a GOTURN network;
s32, cutting the current video frame and the boundary box to obtain a central area with a target, and cutting the previous frame of video to obtain a search area with the target;
s33, enabling the search area of the previous frame target and the current frame obtained in the S32 to pass through the CNN convolutional layer at the same time, enabling the output of the convolutional layer to pass through the full connection layer to be used for regressing the position of the boundary frame of the current frame target, and drawing the center point of the coordinate frame of the current frame as a track point for subsequent track analysis;
the convolution layer of the GOTURN network adopts a 5-layer structure, the structure refers to a structure in CaffeNet, excitation functions of the convolution layers all adopt relu excitation functions, a pooling layer is added behind part of the convolution layers, a full connection layer is composed of 3 layers, 4096 nodes are arranged in each layer, and dropout and relu excitation functions are adopted among the layers to prevent overfitting and gradient disappearance. And simultaneously passing the target of the previous frame and the search area of the current frame through the CNN convolution layer, and then passing the output of the convolution layer through the full-connection layer for returning the position of the target of the current frame.
The loss function is L1-loss, expressed as follows:
Figure BDA0002797974740000061
where n denotes the total number of predicted objects, yiRepresenting the actual output, diRepresenting a real tag.
S34, repeating step S3 until all video frames enter the GOTURN network.
And S4, judging whether the crossing behavior is the crossing behavior by using the priori knowledge through the relative position relation between the track point set and the railing position. As shown in fig. 2, the method specifically includes the following sub-steps:
s41, recording the track points generated in the step S33, and generating a track point set;
s42, manually marking the position coordinates of the railing line for subsequent track analysis;
and S43, judging whether the track is a crossing behavior or not by using the prior knowledge according to the relative position relation between the track point set and the railing position, if so, outputting a cross label, and if not, not outputting the cross label.
English in fig. 4 indicates: the Current frame is the Current frame, the Previous frame is the Previous frame, the Search Region is the Search box set, What is tracked by the What to track, Conv Layers are the conversion Layers, full-Connected Layers are the full-Connected Layers, and the Predicted location of the target with the Search Region is the Predicted position of the target in the Search area.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (7)

1. A rollover behavior identification method based on a convolutional neural network is characterized by comprising the following steps:
s1, processing the video data: screening and cutting, namely screening out videos with behaviors of crossing the railing and other behaviors near the railing, and cutting the videos into pictures of video frames;
s2, detecting the video frame obtained in the step S1 through a Yolo target detection network; specifically, the method comprises the following steps: the Yolo target detection network comprises at least three parts: a Backbone section, a tack section, a Head section; aggregating and forming image features through a backhaul part, combining and transmitting the image features to a prediction layer through a Neck part, predicting the image features through a Head part, generating a boundary frame and predicting categories;
s3, transferring the boundary frame, the prediction type person and the current video frame to a GOTURN network for target tracking, inputting the coordinates of the boundary frame from the current frame to the GOTURN network, inputting the target from the previous frame to the GOTURN network, and the GOTURN network learning and comparing the targets to find the target object in the current image, drawing track points frame by frame and connecting the track points into a track line;
and S4, judging whether the behavior is a behavior of crossing the railing by using the priori knowledge through the relative position relation between the track point set and the railing position.
2. The convolutional neural network-based crossing behavior recognition method as claimed in claim 1, wherein the step S1 comprises the following sub-steps:
s11, searching and downloading a plurality of video data sets containing various character actions;
s12, screening character videos containing behaviors of crossing the rail and other object videos similar to the character shape from the video data set;
and S13, cutting the screened videos, and continuously cutting each video into video frames according to 25fps to obtain a series of continuous video frames and storing the video frames.
3. The method for identifying the rollover behavior based on the convolutional neural network as claimed in claim 1, wherein the partially aggregating and forming of the image feature by the backhaul is specifically as follows: the input video frames are aggregated through a CSPRESNext50 neural network and form image features to realize image feature extraction.
4. The method for identifying the crossing behavior based on the convolutional neural network as claimed in claim 3, wherein the image features are combined and transmitted to the prediction layer through the tack part, specifically: and combining the image characteristics through SPP-block and PANET and transmitting the image characteristics to a prediction layer.
5. The convolutional neural network-based traversal behavior identification method as claimed in claim 4, wherein the prediction categories at least include prediction category persons.
6. The convolutional neural network-based traversal behavior identification method as claimed in claim 1, wherein the step S3 comprises the following sub-steps:
s31, the boundary box generated in the step S24 and the prediction type are person, and the current video frame is transmitted to a GOTURN network;
s32, cutting the current video frame and the boundary box to obtain a central area with a target, and cutting the previous frame of video to obtain a search area with the target;
s33, enabling the search area of the previous frame target and the current frame obtained in the S32 to pass through the CNN convolutional layer at the same time, enabling the output of the convolutional layer to pass through the full connection layer to be used for regressing the position of the boundary frame of the current frame target, and drawing the center point of the coordinate frame of the current frame as a track point for subsequent track analysis;
and S34, repeating the steps S31-S32 until all the video frames enter the GOTURN network.
7. The convolutional neural network-based traversal behavior identification method as claimed in claim 6, wherein the step S4 comprises the following sub-steps:
s41, recording the track points generated in the step S33, and generating a track point set;
s42, manually marking the position coordinates of the railing line for subsequent track analysis;
and S43, judging whether the track is a crossing behavior or not by using the prior knowledge according to the relative position relation between the track point set and the railing position, if so, outputting a cross label, and if not, not outputting the cross label.
CN202011338744.6A 2020-11-25 2020-11-25 Convolution neural network-based crossing behavior identification method Active CN112487920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011338744.6A CN112487920B (en) 2020-11-25 2020-11-25 Convolution neural network-based crossing behavior identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011338744.6A CN112487920B (en) 2020-11-25 2020-11-25 Convolution neural network-based crossing behavior identification method

Publications (2)

Publication Number Publication Date
CN112487920A CN112487920A (en) 2021-03-12
CN112487920B true CN112487920B (en) 2022-03-15

Family

ID=74934564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011338744.6A Active CN112487920B (en) 2020-11-25 2020-11-25 Convolution neural network-based crossing behavior identification method

Country Status (1)

Country Link
CN (1) CN112487920B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808162B (en) * 2021-08-26 2024-01-23 中国人民解放军军事科学院军事医学研究院 Target tracking method, device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035304A (en) * 2018-08-07 2018-12-18 北京清瑞维航技术发展有限公司 Method for tracking target, calculates equipment and device at medium
CN109887281A (en) * 2019-03-01 2019-06-14 北京云星宇交通科技股份有限公司 A kind of method and system monitoring traffic events
CN110781806A (en) * 2019-10-23 2020-02-11 浙江工业大学 Pedestrian detection tracking method based on YOLO

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11030476B2 (en) * 2018-11-29 2021-06-08 Element Ai Inc. System and method for detecting and tracking objects

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035304A (en) * 2018-08-07 2018-12-18 北京清瑞维航技术发展有限公司 Method for tracking target, calculates equipment and device at medium
CN109887281A (en) * 2019-03-01 2019-06-14 北京云星宇交通科技股份有限公司 A kind of method and system monitoring traffic events
CN110781806A (en) * 2019-10-23 2020-02-11 浙江工业大学 Pedestrian detection tracking method based on YOLO

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A novel yolo-based real-time people counting approach;Peiming Ren;《IEEE》;20171102;第1-6页 *
Recurrent yolo and LSTM-based IR single pedestrian tracking;Sungmin Yun;《IEEE》;20200130;第1-7页 *
周界视频监控中人员翻越行为检测算法;张泰;《中国优秀硕士学位论文全文数据库》;20180630;I136-781 *
面向旅游景区的异常行为识别系统的研究与实现;周巧瑜;《中国优秀硕士学位论文全文数据库》;20220131;I138-2756 *

Also Published As

Publication number Publication date
CN112487920A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
Wang et al. Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes
CN109389055B (en) Video classification method based on mixed convolution and attention mechanism
Sirohi et al. Efficientlps: Efficient lidar panoptic segmentation
CN109766830A (en) A kind of ship seakeeping system and method based on artificial intelligence image procossing
Andrews Sobral et al. Highway traffic congestion classification using holistic properties
Li et al. A method of cross-layer fusion multi-object detection and recognition based on improved faster R-CNN model in complex traffic environment
CN109902806A (en) Method is determined based on the noise image object boundary frame of convolutional neural networks
CN108171112A (en) Vehicle identification and tracking based on convolutional neural networks
Yao et al. When, where, and what? A new dataset for anomaly detection in driving videos
CN108304798A (en) The event video detecting method of order in the street based on deep learning and Movement consistency
Rasouli et al. Multi-modal hybrid architecture for pedestrian action prediction
CN110569843B (en) Intelligent detection and identification method for mine target
CN105809718B (en) A kind of method for tracing object of track entropy minimization
CN114155527A (en) Scene text recognition method and device
CN108108688B (en) Limb conflict behavior detection method based on low-dimensional space-time feature extraction and topic modeling
Varior et al. Multi-scale attention network for crowd counting
Dewangan et al. Towards the design of vision-based intelligent vehicle system: methodologies and challenges
CN109242019A (en) A kind of water surface optics Small object quickly detects and tracking
Farahnakian et al. Object detection based on multi-sensor proposal fusion in maritime environment
Al-Heety Moving vehicle detection from video sequences for traffic surveillance system
Azimjonov et al. Vision-based vehicle tracking on highway traffic using bounding-box features to extract statistical information
CN112487920B (en) Convolution neural network-based crossing behavior identification method
Bourja et al. Real time vehicle detection, tracking, and inter-vehicle distance estimation based on stereovision and deep learning using YOLOv3
CN113657414A (en) Object identification method
CN111275733A (en) Method for realizing rapid tracking processing of multiple ships based on deep learning target detection technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant