CN112784756B - Human body identification tracking method - Google Patents

Human body identification tracking method Download PDF

Info

Publication number
CN112784756B
CN112784756B CN202110095729.1A CN202110095729A CN112784756B CN 112784756 B CN112784756 B CN 112784756B CN 202110095729 A CN202110095729 A CN 202110095729A CN 112784756 B CN112784756 B CN 112784756B
Authority
CN
China
Prior art keywords
human body
training
network
centernet
tracking method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110095729.1A
Other languages
Chinese (zh)
Other versions
CN112784756A (en
Inventor
王堃
刘耀辉
戴旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110095729.1A priority Critical patent/CN112784756B/en
Publication of CN112784756A publication Critical patent/CN112784756A/en
Application granted granted Critical
Publication of CN112784756B publication Critical patent/CN112784756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body identification tracking method, which comprises the following steps: step 100: collecting original video stream data, and converting the original video stream data into pictures to establish an initial data set; step 200: performing enhancement processing and screening on the initial data set to obtain a training set, a verification set and a test set; step 300: constructing a Centernet network structure consisting of a main network, an upper sampling path and a top convolution, wherein the top convolution adopts a depth separable convolution; step 400: designing a BOX matching mechanism and a loss function to construct a complete Centernet network structure; step 500: training, verifying and testing the complete Centet network structure by using a training set, a verification set and a test set to obtain a Centet network model; step 600: and identifying and tracking human bodies in the real-time video stream data by using the Centernet network model. The human body identification tracking method optimizes the structure of the Centernet network, improves the detection speed under the condition of not reducing the detection accuracy and optimizes the balance between the accuracy and the speed.

Description

Human body identification tracking method
Technical Field
The invention relates to the field of machine vision, in particular to a human body identification tracking method.
Background
Multi-Object tracking (MOT) is a research hotspot in the field of computer vision, and its content refers to information such as the position, size, and complete motion trajectory of each target of an independent target meeting requirements or having a certain visual characteristic determined in a specific or real-time video sequence. In recent years, with the rapid increase of data processing capacity and the development of image analysis technology, target monitoring and real-time tracking technology is separate, and has very important practical value in the fields of video monitoring, positioning navigation, intelligent human-computer interaction, virtual reality and the like, and the multi-target tracking technology based on video stream has become a popular direction for research of all experts and scholars.
The Centernet network is used as a target tracking algorithm, a region of interest is not required to be established in a region, the speed is greatly improved, and an optimization space is provided in the balance between the detection accuracy and the detection speed.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a human body identification tracking method, which can further improve the detection speed and enlarge the receptive field while ensuring the detection accuracy.
The technical scheme is as follows: the human body identification tracking method specifically comprises the following steps:
step 100: collecting original video stream data, and converting the original video stream data into pictures to establish an initial data set;
step 200: performing enhancement processing and screening on the initial data set to obtain a training set, a verification set and a test set;
step 300: constructing a Centernet network structure consisting of a main network, an upper sampling path and a top convolution, wherein the top convolution adopts a depth separable convolution;
step 400: designing a BOX matching mechanism and a loss function to construct a complete Centernet network structure;
step 500: training, verifying and testing the complete Centet network structure by using a training set, a verification set and a test set to obtain a Centet network model;
step 600: human body recognition in the real-time video stream data is tracked using the centrnet network model.
Further, the BOX matching mechanism in step 400 is: if the Bbox containing the central point of the object predicted by the characteristic point is occupied, the Bbox closest to the central point of the object is selected as the Anchor.
Further, the loss function in step 400 is expressed as:
L del =L k +L size +L off
Figure BDA0002913841120000011
Figure BDA0002913841120000012
wherein L is del To total loss, L k For confidence loss, L size For target frame size loss, L off For the center offset loss, the predicted Bbox parameter is set to (b) x ,b y ,b w ,b h ) Wherein b is x And b y The position of the center point of Box, b w And b h Representing the width and height of Box, adding three influence factors of xi, delta and zeta into the confidence loss, namely:
L k =ξ 1 L nt2 L pt
L nt =-(1-b y ^) δ 1 *log(b y ^+ζ)
L pt =-(1-b y ^) δ 2 *log(b y ^)
wherein L is nt For negative sample loss, L pt For positive sample loss, xi 1 、ξ 2 、δ 1 、δ 2 ζ is searched by the grid to obtain the optimum value.
Further, the original video stream data in step 100 is obtained by real-time video recording through a camera and by means of an internet crawler.
Further, the enhancement processing in step 200 includes geometric transformation and color transformation.
Further, the backbone network in step 300 is one of ResNet-18, MobileNet, Xception, ShuffleNet, ResNet101, and DenseNet.
Further, the upper sampling path in step 300 includes a CBAM module and a feature fusion module, where the CBAM module is configured to optimize the extracted image features, and the feature fusion module is configured to fuse shallow features, that is, deep features.
Further, the activation functions of the Centernet network in the step 300 are h-swish and h-sigmoid.
The step 500 comprises:
step 510: giving a model training mode and parameters, and sending a training set into a complete Centernet network structure for training to obtain a first characteristic data set;
step 520: training is carried out on a vector device to obtain a Centernet network model.
Has the beneficial effects that: compared with the prior art, the invention has the following advantages:
1. the main network of the Centernet network is replaced by the lightweight network, the method is suitable for embedded equipment, and the detection speed is improved.
2. A feature fusion module is introduced in the process of mining, low-level spatial information and high-level semantic information are fused, and the defects of mutual blocking of pedestrians and missing detection and false detection caused by illumination and visual angle change are overcome.
3. An attention module is introduced, an activation function with small calculation amount is replaced, and the practicability of the algorithm is guaranteed while the calculation is fast.
4. The convolution operation in the Centernet network is replaced by the deep separable convolution, so that the receptive field is expanded on the basis of not reducing the resolution and not increasing the calculated amount, and the large target is better detected, positioned and segmented.
Drawings
FIG. 1 is a flow chart of a human identification tracking method of the present invention;
figure 2 is a diagram comparing the architecture of the Centernet of the present invention with that of a conventional Centernet.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
Referring to fig. 1, the human body recognition and tracking method according to the embodiment of the invention comprises the following steps:
step 100: collecting original video stream data, and converting the original video stream data into pictures to establish an initial data set;
step 200: performing enhancement processing and screening on the initial data set to obtain a training set, a verification set and a test set;
step 300: building a Centernet network structure consisting of a backbone network, an upper mining path and a top convolution, wherein the top convolution adopts a depth separable convolution;
step 400: designing a BOX matching mechanism and a loss function to construct a complete Centernet network structure;
step 500: training, verifying and testing the complete Centet network structure by using a training set, a verification set and a test set to obtain a Centet network model;
step 600: and identifying and tracking human bodies in the real-time video stream data by using the Centernet network model.
According to the human body identification tracking method in the technical scheme, the depth separable convolution is adopted as the Centernet network structure, parameters and calculated amount can be obviously compressed, the operation performance of the model is improved, meanwhile, the receptive field can be expanded on the basis that the image resolution is not reduced and extra calculated amount is not increased, and then the large target is detected and segmented, and the target is accurately positioned. Meanwhile, convolution with different convolution rates is adopted, so that characteristics of different receptive fields can be obtained, and the characteristics of the multi-scale pedestrian can be obtained. The designed Box matching mechanism and the loss function can respectively solve the problem of center point coincidence and the problem of unbalance of positive and negative samples which often occur in pedestrian detection.
In the Centernet network, a series of fixed BBoxs in the feature map are judged to be positive samples or not by calculating the intersection ratio, if the intersection ratio is more than 0.7, the samples are marked as positive samples, and if the intersection ratio is less than 0.3, the samples are marked as negative samples. The BBox of the positive sample contains center points of objects, and the center points are at low resolution, each of which can detect only one object, and the network can predict the BBox only by predicting the offset within a certain cell. In this design, one feature point can only predict one object, and if more than one object center point in one image is overlapped, missing detection is caused, which is common in pedestrian detection. So in some embodiments, the Box matching mechanism of step 400 is: when the Anchor is selected, if the center point BBox corresponding to the characteristic point is occupied, the BBox closest to the center point is selected as the Anchor to predict the object, so that the problem of center point repetition is avoided.
In some embodiments, the loss function consists of three parts, and the overall can be represented as:
L del =L k +L size +L off
Figure BDA0002913841120000041
Figure BDA0002913841120000042
wherein L is del To total loss, L k For confidence loss, L size For target frame size loss, L off Is the center offset loss. Setting the predicted Bbox parameter to (b) x ,b y ,b w ,b h ) Wherein b is x And b y The position of the center point of Box, b w And b h Representing the width and height of Box, when the input model is 512 × 512 in size and the output is 28 × 28 feature map, since one feature point only predicts one object, extreme imbalance of positive and negative samples occurs in extreme cases. To solve this problem, adding three influencing factors of xi, delta and zeta into the confidence coefficient loss improves the loss of positive samples and reduces the loss of negative samples to solve the problem of unbalance of the positive and negative samples, namely:
L k =ξ 1 L nt2 L pt
L nt =-(1-b y ^) δ 1 *log(b y ^+ζ)
L pt =-(1-b y ^) δ 2 *log(b y ^)
at negative sample loss L nt Where by setting ζ and δ 1 Two factors to reduce the loss of negative samples, at positive samples, L pt Middle passing delta 2 Making adjustments, and finally controlling the contribution of positive and negative sample losses by xi factorsAnd (4) proportion. By aiming at xi in the loss function 1 、ξ 2 、δ 1 、δ 2 ζ the optimal set of parameters is found using a grid search. In this embodiment, ξ is taken 1 Is 0.25 ξ 2 Is 1, delta 1 Is 3, delta 2 It was 1.5 and ζ was 0.2.
In some embodiments, the raw video stream data in step 100 may be obtained by real-time video recording of scenes of ground pedestrians, and the database may be augmented by an internet crawler. At present, most of pedestrian detection public data sets such as MIT and ImageNet are in head-up view angles, and are not suitable for monitoring cameras arranged in top-down view angles, so that pedestrian data in the top-down view angles need to be shot by oneself and obtained in field, and data volume is supplemented by matching with an internet crawler.
In some embodiments, the original video stream data is converted into pictures through a script, and the video stream can be converted into a group of pictures by calling an imerecode function in the CV2 to circularly read the video and performing a storage operation every several frames.
In some embodiments, the data enhancement in step 200 mainly includes two means of set transformation and color transformation, wherein the geometric transformation includes various operations such as random flipping, rotation, clipping, deformation, scaling, and the like, and the color transformation includes noise, gaussian blur, color transformation, erasure, padding, and the like. In the present embodiment, random rotation and scaling in the geometric transformation and gaussian blur in the color transformation are mainly employed.
In some embodiments, the enhanced picture needs to be manually screened, and the scene types and the number of pedestrians are controlled through manual screening, so that different types of data are evenly distributed as much as possible, and the generalization performance of the model can be improved, and overfitting of the model is prevented. In this example, the samples were marked in the PASCAL VOC format by manual labeling. The PASCAL VOC format is used because most databases are now in this format, which facilitates training other types of data features. The labeling tool is LabelImg, is a multi-platform image labeling tool written by adopting Python language, labels sample information in a visual interface interaction mode to obtain xml script files corresponding to the samples one by one, labels the required object information as pedestrian category attribute (Person) and coordinate information of a target pedestrian boundary box, and finally obtains a set of complete training set comprising a training set, a verification set and a test set.
The left diagram in fig. 2 is a conventional centret network structure using a hourglass network structure, and the right diagram in fig. 2 is a centret network structure according to an embodiment of the present invention. In some embodiments, the centrenetet network structure in step 300 adopts a lightweight network more suitable for embedded devices, such as ResNet-18, MobileNet, Xception, ShuffleNet, etc., and it is understood that the backbone network may be switched to a larger network such as ResNet101, DenseNet, etc. to obtain higher accuracy.
In this embodiment, the backbone network of the centret network adopts a light-weight residual error network ResNet-18 to increase the detection speed, and the network structure list is shown in table 1.
TABLE 1 ResNet-18 network architecture Table
Figure BDA0002913841120000051
Figure BDA0002913841120000061
In this embodiment, first, up-and-down sampling is performed by using the transposed convolution, and then the number of convolution kernels is changed by using the deformable convolution, and then the feature map is up-sampled by using the transposed convolution. The method comprises the steps of selecting outputs of 'layer 2', 'layer 3' and 'layer 4' in a ResNet network as feature graphs of '8 x', '16 x' and '32 x', fusing the three feature graphs through a feature fusion module, then acquiring '4 x' on the feature graph of '8 x' after fusion through deconvolution, and finally performing category confidence coefficient and BBox prediction through two convolutions at the top end of the network.
Since a large amount of characteristic information is lost through a plurality of convolution and pooling operations, the detection accuracy is reduced. Meanwhile, because the size of the shallow feature map is generally large, the real-time performance of the network is reduced by introducing the shallow features in a large amount, and at the feature representation level, the low-level feature representation is different from the high-level feature representation, and a channel is only used for connecting the low-level feature with the high-level feature, so that a lot of noise is brought. In order to solve the above problems, in some embodiments, a feature fusion module is added in the upper sampling path, and the feature fusion module fuses the shallow feature and the deep feature, so that low-level rich spatial information and high-level semantic information are fused, and thus the detection accuracy of the small target and the blocked target can be increased, which has great advantage in detection and tracking of a large stream of people.
In some embodiments, in order to optimize the extracted image features, avoid a large number of redundant features, further increase the detection speed and obtain better feature expression, a concentration module (CBAM) is added in the upper sampling path.
In some embodiments, the Centernet network structure further adopts h-swish and h-Sigmoid activation functions on the basis of adding the attention module to replace the traditional ReLU and Sigmoid activation functions, so that the calculation amount is further reduced, and meanwhile, the precision loss in model calculation can be effectively avoided.
In some embodiments, step 500 comprises:
step 510: giving a model training mode and parameters, and sending a training set into a complete Centernet network structure for training to obtain a first characteristic data set;
step 520: training is carried out on a vector device to obtain a Centernet network model.
In this embodiment, the training process is in turn the full network structure-partial structure-header structure-full network structure. The specific training modes and parameters in step 510 are as follows: the loss in the early stage of training is large, a step-long learning rate strategy is adopted, and the convergence of the model is accelerated through a large learning rate; and the cosine function type learning rate attenuation is used in the later training stage to provide a smaller learning rate, so that the convergence stability of the model is ensured. In the whole training process, the sparse rate is 0.01, the gamma in the learning rate is 0.1, the learning rate is 0.0001, the step size is 100, the learning rate is reduced to one tenth of the previous learning rate per 100 iteration steps, the iteration cycle is 140 times, and the batch size of batch training is 16.
In step 520, the weight file of the model is saved once for each iteration cycle, and the training is continued by selecting the continuous training mode and inheriting the weight file of the selected iteration cycle.

Claims (9)

1. A human body identification tracking method is characterized by comprising the following steps:
step 100: collecting original video stream data, and converting the original video stream data into pictures to establish an initial data set;
step 200: performing enhancement processing and screening on the initial data set to obtain a training set, a verification set and a test set;
step 300: constructing a Centernet network structure consisting of a main network, an upper sampling path and a top convolution, wherein the top convolution adopts a depth separable convolution;
step 400: designing a BOX matching mechanism and a loss function to construct a complete Centernet network structure;
step 500: training, verifying and testing the complete Centet network structure by using a training set, a verification set and a test set to obtain a Centet network model;
step 600: and identifying and tracking human bodies in the real-time video stream data by using the Centernet network model.
2. The human body identification tracking method according to claim 1, wherein the BOX matching mechanism in the step 400 is: if the Bbox containing the center point of the object predicted by the characteristic point is occupied, selecting the Bbox closest to the center point of the object as Anchor.
3. The method for recognizing and tracking the human body as claimed in claim 1, wherein the loss function in the step 400 is expressed as:
L del =L k +L size +L off
Figure FDA0002913841110000011
Figure FDA0002913841110000012
wherein L is del For total losses, L k For confidence loss, L size For target frame size loss, L off For the center offset loss, the predicted Bbox parameter is set to (b) x ,b y ,b w ,b h ) Wherein b is x And b y Respectively the position of the center point of Box, b w And b h Representing the width and height of Box, adding three influence factors of xi, delta and zeta into the confidence loss, namely:
L k =ξ 1 L nt2 L pt
L nt =--(1-b y ^)δ 1 *log(b y ^+ζ)
L pt =-(1-b y ^)δ 2 *log(b y ^)
wherein L is nt For negative sample loss, L pt For positive sample loss, xi 1 、ξ 2 、δ 1 、δ 2 ζ, the best value is obtained by the grid search.
4. The human body identification tracking method according to claim 1, wherein the original video stream data in the step 100 is obtained by a camera real-time video recording assisted by an internet crawler.
5. The method for recognizing and tracking human body according to claim 1, wherein the enhancement processing in step 200 includes geometric transformation and color transformation.
6. The human body identification tracking method according to claim 1, wherein the backbone network in step 300 is one of ResNet-18, MobileNet, Xception, ShuffleNet, ResNet101 and DenseNet.
7. The human body recognition and tracking method according to claim 1, wherein the up-sampling path in step 300 comprises a CBAM module and a feature fusion module, the CBAM module is used for optimizing the extracted image features, and the feature fusion module is used for fusing shallow features and deep features.
8. The human body identification tracking method according to claim 7, wherein the activation functions of the Centernet network in the step 300 are h-swish and h-sigmoid.
9. The human body recognition tracking method of claim 1, wherein the step 500 comprises:
step 510: giving a model training mode and parameters, and sending a training set into a complete Centernet network structure for training to obtain a first characteristic data set;
step 520: training is carried out on a vector device to obtain a Centernet network model.
CN202110095729.1A 2021-01-25 2021-01-25 Human body identification tracking method Active CN112784756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110095729.1A CN112784756B (en) 2021-01-25 2021-01-25 Human body identification tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110095729.1A CN112784756B (en) 2021-01-25 2021-01-25 Human body identification tracking method

Publications (2)

Publication Number Publication Date
CN112784756A CN112784756A (en) 2021-05-11
CN112784756B true CN112784756B (en) 2022-08-26

Family

ID=75758905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110095729.1A Active CN112784756B (en) 2021-01-25 2021-01-25 Human body identification tracking method

Country Status (1)

Country Link
CN (1) CN112784756B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191334B (en) * 2021-05-31 2022-07-01 广西师范大学 Plant canopy dense leaf counting method based on improved CenterNet
CN113313736B (en) * 2021-06-10 2022-05-17 厦门大学 Online multi-target tracking method for unified target motion perception and re-identification network
CN113569727B (en) * 2021-07-27 2022-10-21 广东电网有限责任公司 Method, system, terminal and medium for identifying construction site in remote sensing image
CN113808170B (en) * 2021-09-24 2023-06-27 电子科技大学长三角研究院(湖州) Anti-unmanned aerial vehicle tracking method based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321874A (en) * 2019-07-12 2019-10-11 南京航空航天大学 A kind of light-weighted convolutional neural networks pedestrian recognition method
CN111582213A (en) * 2020-05-15 2020-08-25 北京铁科时代科技有限公司 Automobile identification method based on Centernet

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321874A (en) * 2019-07-12 2019-10-11 南京航空航天大学 A kind of light-weighted convolutional neural networks pedestrian recognition method
CN111582213A (en) * 2020-05-15 2020-08-25 北京铁科时代科技有限公司 Automobile identification method based on Centernet

Also Published As

Publication number Publication date
CN112784756A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN112784756B (en) Human body identification tracking method
CN114202672A (en) Small target detection method based on attention mechanism
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN110533041B (en) Regression-based multi-scale scene text detection method
CN111353544B (en) Improved Mixed Pooling-YOLOV 3-based target detection method
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN112036447A (en) Zero-sample target detection system and learnable semantic and fixed semantic fusion method
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN111882620A (en) Road drivable area segmentation method based on multi-scale information
CN112070040A (en) Text line detection method for video subtitles
CN113239753A (en) Improved traffic sign detection and identification method based on YOLOv4
CN114782798A (en) Underwater target detection method based on attention fusion
CN117079163A (en) Aerial image small target detection method based on improved YOLOX-S
CN112507904A (en) Real-time classroom human body posture detection method based on multi-scale features
CN116258990A (en) Cross-modal affinity-based small sample reference video target segmentation method
CN110633706B (en) Semantic segmentation method based on pyramid network
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN115908793A (en) Coding and decoding structure semantic segmentation model based on position attention mechanism
CN117710841A (en) Small target detection method and device for aerial image of unmanned aerial vehicle
CN116597267B (en) Image recognition method, device, computer equipment and storage medium
CN117011515A (en) Interactive image segmentation model based on attention mechanism and segmentation method thereof
CN116403133A (en) Improved vehicle detection algorithm based on YOLO v7
CN116524596A (en) Sports video action recognition method based on action granularity grouping structure
CN111339950A (en) Remote sensing image target detection method
Rao et al. Roads detection of aerial image with FCN-CRF model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant