CN111291745A - Target position estimation method and device, storage medium and terminal - Google Patents

Target position estimation method and device, storage medium and terminal Download PDF

Info

Publication number
CN111291745A
CN111291745A CN201910038152.3A CN201910038152A CN111291745A CN 111291745 A CN111291745 A CN 111291745A CN 201910038152 A CN201910038152 A CN 201910038152A CN 111291745 A CN111291745 A CN 111291745A
Authority
CN
China
Prior art keywords
map
response
frame image
feature
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910038152.3A
Other languages
Chinese (zh)
Other versions
CN111291745B (en
Inventor
潘博阳
罗小伟
王森
刘阳
林福辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN201910038152.3A priority Critical patent/CN111291745B/en
Publication of CN111291745A publication Critical patent/CN111291745A/en
Application granted granted Critical
Publication of CN111291745B publication Critical patent/CN111291745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20056Discrete and fast Fourier transform, [DFT, FFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A target position estimation method and device, a storage medium and a terminal are provided, and the method comprises the following steps: acquiring N characteristic maps obtained by calculating a previous frame of image through N convolutional layers in a convolutional neural network, acquiring N characteristic maps obtained by calculating a current frame of image through N convolutional layers in the convolutional neural network, and sequentially acquiring a first response mapping map, a first candidate response mapping map, a second fused response mapping map, an nth candidate response mapping map, an N +1 response mapping map and an N +1 fused response mapping map; determining the coordinates of the response mapping in the n +1 th fusion response mapping chart with the maximum response value as the n +1 th target center; and when N +1 is equal to N, determining the center of the (N + 1) th target as the target position in the current frame image. The scheme of the invention is beneficial to optimizing the accuracy of target tracking and improving the accuracy of the estimation result.

Description

Target position estimation method and device, storage medium and terminal
Technical Field
The present invention relates to the field of target tracking technologies, and in particular, to a target position estimation method and apparatus, a storage medium, and a terminal.
Background
Computer vision is a development direction of future mobile phone multimedia applications, and Object Tracking (Object Tracking) is an important research topic of computer vision, and at present, the algorithm has been widely used in video monitoring, robot vision, Virtual Reality (VR), and Augmented Reality (AR).
The existing methods for estimating the position of an object generally refer to utilizing an algorithm to automatically locate the position of the object in a subsequent video sequence according to an initial object position. The common target tracking algorithm mainly comprises two steps of Training (tracing) and detecting (Detection). Wherein, the training refers to extracting samples according to the target position of the previous frame, then performing Feature extraction (Feature extraction), and then training the model through a Classifier (Classifier). The detection means that the current frame is predicted according to the model of the previous frame, the sample with the highest confidence coefficient is selected as the target position of the current frame, and then the model parameters are updated to predict the target position of the next frame.
However, in the prior art, the Feature extraction mainly relies on manual features (Hand-manipulated), such as HAAR (HAAR), Histogram of Gradient (HOG), Scale-invariant Feature Transform (SIFT), etc., and the classification operation is performed after a single Feature map is extracted, which results in low accuracy.
Disclosure of Invention
The invention aims to provide a target position estimation method and device, a storage medium and a terminal, which are beneficial to optimizing the accuracy of target tracking and improving the accuracy of an estimation result.
To solve the above technical problem, an embodiment of the present invention provides a target position estimation method, including the following steps: acquiring N feature maps obtained by calculating a previous frame of image through N convolutional layers in a convolutional neural network, acquiring N feature maps obtained by calculating a current frame of image through the N convolutional layers in the convolutional neural network, wherein the size of each feature map is consistent and comprises a plurality of response maps, each response map comprises a response value and a coordinate thereof, N is less than or equal to the number of the convolutional layers in the convolutional neural network and is a positive integer, and the N feature maps of each frame of image are arranged in a reverse order according to the layer number of the convolutional layer for generating each feature map; respectively cutting feature maps with preset sizes by taking coordinates where a target center point of a previous frame image is located as a center in the first feature map of the previous frame image and the first feature map of the current frame image, and obtaining a first response mapping map by adopting classification operation, wherein the first response mapping map comprises a plurality of response mappings, and the first feature map is a feature map generated by calculating the last convolution layer; determining coordinates of a response map with a maximum response value in the first response map as a first target center; respectively cutting feature maps with preset sizes in the first feature map of the previous frame image and the first feature map of the current frame image by taking the first target center as the center, and obtaining a first candidate response map by adopting a classification algorithm, wherein the first candidate response map comprises a plurality of response maps; respectively cutting feature maps with preset sizes in the second feature map of the previous frame image and the second feature map of the current frame image by taking the first target center as the center, and obtaining a second response map by adopting a classification algorithm, wherein the second response map comprises a plurality of response maps; weighting and summing the response maps of the first candidate response map and the second response map by adopting a first preset weight value to obtain a second fusion response map; determining the coordinate of the response mapping with the maximum response value in the second fusion response mapping map as a second target center; respectively calculating to obtain candidate response mapping maps corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the feature maps with preset sizes are respectively cut out by taking the nth target center as the center for the nth feature map of the previous frame image and the nth feature map of the current frame image, and a classification algorithm is adopted to obtain an nth candidate response mapping map, wherein the nth candidate response mapping map comprises a plurality of response mappings, and N is a positive integer and is more than 1 and less than N; respectively calculating a response mapping map corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the feature maps with preset sizes are respectively cut out by taking the nth target center as the center for the (N + 1) th feature map of the previous frame image and the (N + 1) th feature map of the current frame image, and a classification algorithm is adopted to obtain an (N + 1) th response mapping map, wherein the (N + 1) th response mapping map comprises a plurality of response mappings; respectively calculating a fusion response mapping map corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the nth candidate response mapping map and the response mapping of the (N + 1) th response mapping map are weighted and summed by adopting an nth preset weight value to obtain an (N + 1) th fusion response mapping map; determining the coordinates of the response mapping in the n +1 th fusion response mapping chart with the maximum response value as the n +1 th target center; and when N +1 is equal to N, determining the center of the (N + 1) th target as the target position in the current frame image.
Optionally, the obtaining N feature maps obtained by calculating the previous frame of image by using the N convolutional layers in the convolutional neural network, and the obtaining N feature maps obtained by calculating the current frame of image by using the N convolutional layers in the convolutional neural network includes: respectively obtaining a characteristic diagram obtained after a previous frame of image passes through N convolutional layers in a convolutional neural network; respectively obtaining a characteristic diagram obtained after the current frame image passes through N convolutional layers in a convolutional neural network; and respectively scaling the N characteristic graphs of the previous frame image and the N characteristic graphs of the current frame image to preset characteristic graph sizes.
Optionally, the N feature maps of the previous frame image and the N feature maps of the current frame image are respectively scaled to a preset feature map size by using a bilinear interpolation method or a trilinear interpolation method.
Optionally, in the first feature map of the previous frame image and the first feature map of the current frame image, respectively cutting feature maps of preset sizes with a coordinate of a target center point of the previous frame image as a center, and obtaining the first response map by using a classification operation includes: in the first feature map of the previous frame image, a first target window with a preset size is adopted, and a first target feature map with a preset size is cut by taking a coordinate where a target center point in the previous frame image is located as a center; in the first feature map of the current frame image, cutting a first search feature map with a preset size by adopting a first search window with a preset size and taking a coordinate where a target center point in a previous frame image is located as a center; and respectively inputting the first target feature map and the first search feature map into a classifier for classification operation to obtain a first response mapping map.
Optionally, the algorithm of the classification operation includes: KCF algorithm, ADABOOST algorithm, and SVM algorithm.
Optionally, the convolutional neural network includes: AlexNet, VGGNet, and GoogleNet.
To solve the above technical problem, an embodiment of the present invention provides a target position estimation device, including: the acquisition module is suitable for acquiring N characteristic maps obtained by calculating the previous frame of image through N convolutional layers in a convolutional neural network, acquiring N characteristic maps obtained by calculating the current frame of image through the N convolutional layers in the convolutional neural network, wherein the size of each characteristic map is consistent and comprises a plurality of response maps, each response map comprises a response value and a coordinate thereof, N is less than or equal to the number of the convolutional layers in the convolutional neural network and is a positive integer, and the N characteristic maps of each frame of image are arranged in a reverse order according to the layer numbers of the convolutional layers generating the characteristic maps; the first map determining module is suitable for respectively cutting feature maps with preset sizes by taking the coordinate where the target center point of the previous frame image is located as the center in the first feature map of the previous frame image and the first feature map of the current frame image, and obtaining a first response map by adopting classification operation, wherein the first response map comprises a plurality of response maps, and the first feature map is a feature map generated by calculating the last convolution layer; a first center determining module adapted to determine coordinates of a response map having a largest response value in the first response map as a first target center; the first candidate map determining module is suitable for respectively cutting feature maps with preset sizes in the first feature map of the previous frame image and the first feature map of the current frame image by taking the first target center as a center, and obtaining a first candidate response map by adopting a classification algorithm, wherein the first candidate response map comprises a plurality of response maps; the second mapping map determining module is suitable for respectively cutting feature maps with preset sizes in a second feature map of the previous frame image and a second feature map of the current frame image by taking the first target center as a center, and obtaining a second response mapping map by adopting a classification algorithm, wherein the second response mapping map comprises a plurality of response mappings; the first fusion map determining module is suitable for weighting and summing response maps of the first candidate response map and the second response map by adopting a first preset weight value to obtain a second fusion response map; a second center determination module adapted to determine coordinates of a response map having a maximum response value in the second fused response map as a second target center; an nth candidate map determining module, adapted to calculate candidate response maps corresponding to each feature map for the second to nth feature maps of the previous frame image, respectively, wherein for the nth feature map of the previous frame image and the nth feature map of the current frame image, feature maps of a preset size are respectively clipped with an nth target center as a center, and an nth candidate response map is obtained by using a classification algorithm, and the nth candidate response map includes a plurality of response maps, where 1 < N and N is a positive integer; an nth candidate map determining module, adapted to calculate candidate response maps corresponding to each feature map for the second to nth feature maps of the previous frame image, respectively, wherein for the nth feature map of the previous frame image and the nth feature map of the current frame image, feature maps of a preset size are respectively clipped with an nth target center as a center, and an nth candidate response map is obtained by using a classification algorithm, and the nth candidate response map includes a plurality of response maps, where 1 < N and N is a positive integer; an nth candidate map determining module, adapted to calculate candidate response maps corresponding to each feature map for the second to nth feature maps of the previous frame image, respectively, wherein for the nth feature map of the previous frame image and the nth feature map of the current frame image, feature maps of a preset size are respectively clipped with an nth target center as a center, and an nth candidate response map is obtained by using a classification algorithm, and the nth candidate response map includes a plurality of response maps, where 1 < N and N is a positive integer; an n +1 center determination module adapted to determine coordinates of a response map in the n +1 th fused response map having a largest response value as an n +1 th target center; and the target position determining module is suitable for determining the center of the (N + 1) th target as the target position in the current frame image when N +1 is equal to N.
Optionally, the first map determining module includes: the first target image cutting sub-module is suitable for cutting a first target feature image with a preset size by adopting a first target window with a preset size in the first feature image of the previous frame image and taking the coordinate where the target center point in the previous frame image is located as the center; the first search map cutting sub-module is suitable for cutting a first search feature map with a preset size by adopting a first search window with a preset size in the first feature map of the current frame image and taking the coordinate where the target center point in the previous frame image is located as the center; and the classification operation submodule is suitable for respectively inputting the first target characteristic diagram and the first search characteristic diagram into a classifier to carry out classification operation so as to obtain a first response mapping diagram.
To solve the above technical problem, an embodiment of the present invention provides a storage medium having stored thereon computer instructions, which when executed, perform the steps of the above target position estimation method.
In order to solve the above technical problem, an embodiment of the present invention provides a terminal, including a memory and a processor, where the memory stores computer instructions capable of being executed on the processor, and the processor executes the steps of the target position estimation method when executing the computer instructions.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, N characteristic maps of the previous frame image and the current frame image are respectively obtained, the classification operation is carried out on every two characteristic maps in sequence, and the obtained response maps are subjected to weighted summation to obtain the fused response map, so that each characteristic map can be subjected to multiple iterative operations and the result is applied to the estimation of the target position, the target tracking accuracy is favorably optimized, and the estimation result accuracy is improved.
Further, in the embodiment of the present invention, by setting the first target window with a preset size and the first search window with a preset size, a proper feature map may be obtained by clipping to perform a classification operation, which is helpful for obtaining a response map.
Drawings
FIG. 1 is a flow chart of a method for estimating a target location in an embodiment of the present invention;
FIG. 2 is a flowchart of one embodiment of step S101 of FIG. 1;
FIG. 3 is a schematic diagram of an application scenario of a feature map extraction method according to an embodiment of the present invention;
FIG. 4 is a flowchart of one embodiment of step S104 of FIG. 1;
FIG. 5 is a schematic diagram of an application scenario of a target location estimation method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a target position estimation apparatus according to an embodiment of the present invention.
Detailed Description
In the prior art, Convolutional Neural Networks (CNNs) model the real world using a plurality of filters and nonlinear activation functions. The conventional algorithm based on the convolutional neural network is to regard the convolutional neural network as a black box, and output results of the black box are sent to a classifier as features for classification.
Conventional target tracking algorithms use manual features and classifiers to achieve the tracking target. However, for the real world, artificial features have inherent limitations and cannot accurately model targets.
The inventor of the present invention has found through research that in the prior art, the extraction of features mainly depends on manual features such as HAAR, HOG, SIFT, etc. In particular, on the one hand, manual features have a relatively high computational complexity, and on the other hand, manual features do not allow accurate modeling of objects in the real world, for example, when the appearance, shape of the objects changes. Furthermore, after a single characteristic diagram is extracted, classification operation is performed on the single characteristic diagram, which easily causes the problems of insufficient complexity and low accuracy.
In the embodiment of the invention, N characteristic maps of the previous frame image and the current frame image are respectively obtained, the classification operation is carried out on every two characteristic maps in sequence, and the obtained response maps are subjected to weighted summation to obtain the fused response map, so that each characteristic map can be subjected to multiple iterative operations and the result is applied to the estimation of the target position, the target tracking accuracy is favorably optimized, and the estimation result accuracy is improved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Referring to fig. 1, fig. 1 is a flowchart of a target position estimation method according to an embodiment of the present invention. The method may include steps S101 to S112:
step S101: acquiring N feature maps obtained by calculating a previous frame of image through N convolutional layers in a convolutional neural network, acquiring N feature maps obtained by calculating a current frame of image through the N convolutional layers in the convolutional neural network, wherein the size of each feature map is consistent and comprises a plurality of response maps, each response map comprises a response value and a coordinate thereof, N is less than or equal to the number of the convolutional layers in the convolutional neural network and is a positive integer, and the N feature maps of each frame of image are arranged in a reverse order according to the layer number of the convolutional layer for generating each feature map;
step S102: respectively cutting feature maps with preset sizes by taking coordinates where a target center point of a previous frame image is located as a center in the first feature map of the previous frame image and the first feature map of the current frame image, and obtaining a first response mapping map by adopting classification operation, wherein the first response mapping map comprises a plurality of response mappings, and the first feature map is a feature map generated by calculating the last convolution layer;
step S103: determining coordinates of a response map with a maximum response value in the first response map as a first target center;
step S104: respectively cutting feature maps with preset sizes in the first feature map of the previous frame image and the first feature map of the current frame image by taking the first target center as the center, and obtaining a first candidate response map by adopting a classification algorithm, wherein the first candidate response map comprises a plurality of response maps;
step S105: respectively cutting feature maps with preset sizes in the second feature map of the previous frame image and the second feature map of the current frame image by taking the first target center as the center, and obtaining a second response map by adopting a classification algorithm, wherein the second response map comprises a plurality of response maps;
step S106: weighting and summing the response maps of the first candidate response map and the second response map by adopting a first preset weight value to obtain a second fusion response map;
step S107: determining the coordinate of the response mapping with the maximum response value in the second fusion response mapping map as a second target center;
step S108: respectively calculating to obtain candidate response mapping maps corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the feature maps with preset sizes are respectively cut out by taking the nth target center as the center for the nth feature map of the previous frame image and the nth feature map of the current frame image, and a classification algorithm is adopted to obtain an nth candidate response mapping map, wherein the nth candidate response mapping map comprises a plurality of response mappings, and N is a positive integer and is more than 1 and less than N;
step S109: respectively calculating a response mapping map corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the feature maps with preset sizes are respectively cut out by taking the nth target center as the center for the (N + 1) th feature map of the previous frame image and the (N + 1) th feature map of the current frame image, and a classification algorithm is adopted to obtain an (N + 1) th response mapping map, wherein the (N + 1) th response mapping map comprises a plurality of response mappings;
step S110: respectively calculating a fusion response mapping map corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the nth candidate response mapping map and the response mapping of the (N + 1) th response mapping map are weighted and summed by adopting an nth preset weight value to obtain an (N + 1) th fusion response mapping map;
step S111: determining the coordinates of the response mapping in the n +1 th fusion response mapping chart with the maximum response value as the n +1 th target center;
step S112: and when N +1 is equal to N, determining the center of the (N + 1) th target as the target position in the current frame image.
In the specific implementation of step S101, the feature map is obtained through calculation, so that the feature map is operated in the subsequent step.
Referring to fig. 2, fig. 2 is a flowchart of an embodiment of step S101 in fig. 1. The step of acquiring N feature maps obtained by calculating the previous frame image by using the N convolutional layers in the convolutional neural network, and the step of acquiring N feature maps obtained by calculating the current frame image by using the N convolutional layers in the convolutional neural network may include steps S21 to S23, which are described below.
In step S21, feature maps obtained by passing the previous frame of image through N convolutional layers in the convolutional neural network are obtained.
In step S22, feature maps obtained by passing the current frame image through N convolutional layers in the convolutional neural network are obtained.
In step S23, the N feature maps of the previous frame image and the N feature maps of the current frame image are scaled to a preset feature map size, respectively.
Further, a Bi-linear Interpolation (Bi-linear Interpolation) or a tri-linear Interpolation may be used to respectively scale the N feature maps of the previous frame image and the N feature maps of the current frame image to a preset feature map size, thereby more effectively implementing the Interpolation of the target position.
Further, the convolutional neural network may include: alex network (Alex Net), oxford university's Visual Geometry Group Net (VGG Net), and Google network (Google Net).
It should be noted that, in the embodiment of the present invention, no limitation is made to a specific convolutional neural network.
In the specific implementation, VGG Net is taken as an example, and a CNN model is introduced for the feature extraction process. The image is sent to VGG Net for forward propagation, and a Feature Map (Feature Map) is generated corresponding to each convolution layer. The shallow feature map has a higher resolution and the deep feature map has a lower resolution.
Referring to fig. 3, fig. 3 is a schematic view of an application scenario of a feature map extraction method in the embodiment of the present invention.
As depicted in fig. 3, the image 101 is transferred to the CNN model 102, wherein the CNN model 102 includes a plurality of convolutional layers.
The image 101 may be a previous frame image or a current frame image.
Specifically, the plurality of convolutional layers may include a first convolutional layer 103, a second convolutional layer 105, … …, an N-1 convolutional layer 107, an Nth convolutional layer 109.
The N characteristic graphs obtained after calculation are arranged in a reversed order according to the layer numbers of the convolution layers for generating the characteristic graphs. Specifically, the image 101 passes through the first convolution layer 103 to obtain the nth feature map 104, the second convolution layer 105 to obtain the nth-1 feature map 106, the … … passes through the nth-1 convolution layer 107 to obtain the second feature map 108, and the nth convolution layer 109 to obtain the first feature map 110.
With reference to fig. 1, in the specific implementation of step S102, a first response map is obtained by using a classification operation in the first feature map of the previous frame image and the first feature map of the current frame image.
Referring to fig. 4, fig. 4 is a flowchart of an embodiment of step S104 in fig. 1. The step of respectively cutting out feature maps of a preset size with the coordinate of the target center point of the previous frame image as the center in the first feature map of the previous frame image and the first feature map of the current frame image, and obtaining the first response map by using a classification operation may include steps S41 to S43, and each step is described below.
In step S41, a first target window with a preset size is used in the first feature map of the previous frame image, and the first target feature map with a preset size is cut out with the coordinates of the center point of the target in the previous frame image as the center.
In step S42, in the first feature map of the current frame image, a first search window with a preset size is adopted, and the first search feature map with a preset size is cut out with the coordinate of the target center point in the previous frame image as the center.
In step S43, the first target feature map and the first search feature map are respectively input to a classifier for classification operation, so as to obtain a first response map.
Further, the algorithm of the classification operation may include: a Kernellated Correlation Filter (KCF) algorithm, an adaptive enhancement (ADABOOST) algorithm, and a Support Vector Machine (SVM) algorithm.
It should be noted that, in the embodiment of the present invention, the specific algorithm of the classification operation is not limited.
In one non-limiting example of an embodiment of the present invention, a KCF classifier is used to perform a KCF classification operation, thereby obtaining a first response map.
In the specific implementation, firstly, the coordinates of the central point of the target in the image are takenTo center, an image sample x of size W H is acquired in the vicinity of the center to train the classifier, and then all the circularly shifted samples x are filtered by the correlation filter using the properties of the circularly shifted Matrix (Cyclic Shift Matrix) and the appropriate augmented imagew,hAnd (W, H) is formed by {0,1, W-1} × {0,1, H-1} as a training sample of the classifier. Meanwhile, the Regression Target y follows Gaussian distribution, namely the central point value of the Target is 1, the value of the Regression Target y is more attenuated at the position farther away from the central point and is attenuated to 0 at the edge of the Target, wherein y (w, h) represents xw,hThe Label (Label) of (1).
The purpose of the training is to find the following function:
f(z)=wTz
such that the sample xw,hAnd its regression target y (w, h) is minimized, i.e. mean square error
Figure BDA0001946169540000101
Where phi denotes the mapping of samples to Hilbert space by a kernel function k. The inner product of x and x' (Innerproduct) is expressed as
<φ(x),φ(x')>=κ(x,x')
Where λ represents the regularization term coefficient (regularization term).
The solution w of the linear problem after mapping its input into the nonlinear feature space phi (x) is expressed as
Figure BDA0001946169540000102
And the solution of vector α is
Figure BDA0001946169540000111
Wherein F and F-1Respectively, the fourier forward and inverse transforms. Wherein (k)x)=κ(xw,hX) vector α contains all of the α (w, h) coefficients, the Appearance Model (Appearance Model) needs to be updated to process each frame object
Figure BDA0001946169540000112
The KCF tracking algorithm Model comprises a learned Target Appearance Model (Target Appearance Model)
Figure BDA0001946169540000113
And classifier coefficients F (α).
Further, the response map value and its coordinates may be calculated in the current frame:
Figure BDA0001946169540000114
where ⊙ represents a point-by-point multiplication (Element-wise Product),
Figure BDA0001946169540000115
representing the learned target appearance model.
According to the above steps, a first response map can be obtained. It should be noted that, in the embodiment of the present invention, the step of inputting the feature map into the classifier to perform the classification operation to obtain the response map may be implemented by using the above steps.
In the embodiment of the invention, by setting the first target window with the preset size and the first search window with the preset size, a proper feature map can be obtained through cutting to perform classification operation, which is beneficial to realizing the acquisition of the response map.
With continuing reference to fig. 1, the specific implementation of steps S103 through S112 can be described in detail with reference to fig. 5.
Referring to fig. 5, fig. 5 is a schematic view of an application scenario of a target position estimation method in the embodiment of the present invention. In the application scenario diagram, the number N of convolutional layers is 3.
In the first feature map of the previous frame image, a first target window 111 with a preset size is adopted, and a first target feature map 112 with a preset size is cut out by taking the coordinate where the target center point of the previous frame image is located as the center; in the first feature map of the current frame image, a first search window 113 with a preset size is adopted, and a first search feature map 114 with a preset size is cut by taking the coordinate where the target center point of the previous frame image is located as the center; the first target feature map 112 and the first search feature map 114 are respectively input to a KCF classifier 115 for classification operation, so as to obtain a first response map 116.
The coordinates of the response map having the largest response value in the first response map 116 are determined as the first target center.
In the first feature map of the previous frame image, a first candidate target window 211 with a preset size is adopted, and a first candidate target feature map 212 with a preset size is cut by taking the first target center as the center; in the first feature map of the current frame image, a first candidate search window 213 with a preset size is adopted, and a first candidate search feature map 214 with a preset size is cut by taking the first target center as a center; the first candidate target feature map 212 and the first candidate search feature map 214 are respectively input to a KCF classifier 215 for classification operation, so as to obtain a first candidate response map 216.
The preset size may be the same as the preset size in the above step, and the KCF taxonomy 215 may be the same as the KCF classifier 115, which will not be described in detail in the following description.
In the second feature map of the previous frame image, a second target feature map 312 with a preset size is cut by adopting a second target window 311 with a preset size and taking the first target center as the center; in the second feature map of the current frame image, a second search window 313 with a preset size is adopted, and a second search feature map 314 with a preset size is cut out with the first target center as the center; the second target feature map 312 and the second search feature map 314 are respectively input to a KCF classifier 315 for classification operation, so as to obtain a second response map 316.
The response maps of the first candidate response map 216 and the second response map 316 are weighted and summed with a first preset weight value to obtain a second fused response map 400.
The coordinates of the response map having the largest response value in the second fused response map 400 are determined as the second target center.
In the second feature map of the previous frame image, a second candidate target window 411 with a preset size is adopted, and a second candidate target feature map 412 with a preset size is cut out by taking the center of the second target as the center; in the second feature map of the current frame image, a second candidate search window 413 with a preset size is adopted, and a second candidate search feature map 414 with a preset size is cut out with the second target center as the center; the second candidate target feature map 412 and the second candidate search feature map 414 are respectively input to a KCF classifier 415 for classification operation, so as to obtain a second candidate response map 416.
In the second feature map of the previous frame image, a third target feature map 512 with a preset size is cut by adopting a third target window 511 with a preset size and taking the second target center as the center; in the second feature map of the current frame image, a third search window 513 with a preset size is adopted, and a third search feature map 514 with a preset size is cut out with the second target center as the center; and inputting the third target feature map 512 and the third search feature map 514 into a KCF classifier 515 respectively for classification operation to obtain a third response map 516.
And performing weighted summation on the response maps of the second candidate response map 416 and the third response map 516 by using a second preset weight value to obtain a third fused response map 517.
The coordinates of the response map having the largest response value in the third fused response map 517 are determined as the third target center.
Further, the third target center may be determined as the target position in the current frame image, and the final target window 518 may also be determined with the third target center as the center, for example, by performing clipping with a preset size.
It should be noted that in fig. 5, 3 feature maps are used for description, and when N > 3, the weighted response map may be determined continuously to determine the nth target center.
Specifically, candidate response maps corresponding to each feature map may be obtained by calculating for the second to nth feature maps of the previous frame image, respectively, where for the nth feature map of the previous frame image and the nth feature map of the current frame image, feature maps of a preset size are respectively clipped with an nth target center as a center, and an nth candidate response map is obtained by using a classification algorithm, where the nth candidate response map includes a plurality of response maps, where 1 < N and N is a positive integer.
And respectively calculating the second to the nth feature maps of the previous frame image to obtain a response mapping map corresponding to each feature map, wherein the N +1 th feature map of the previous frame image and the N +1 th feature map of the current frame image are respectively cut out with the N target center as the center and the N +1 th response mapping map with a preset size is obtained by adopting a classification algorithm, and the N +1 th response mapping map comprises a plurality of response mappings.
And respectively calculating a fusion response mapping map corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the nth candidate response mapping map and the response mapping of the (N + 1) th response mapping map are weighted and summed by adopting an nth preset weight value to obtain an (N + 1) th fusion response mapping map.
And determining the coordinates of the response mapping in the n +1 th fusion response mapping chart with the maximum response value as the n +1 th target center.
And when N +1 is equal to N, determining that the N +1 th target center is the target position in the current frame image, and determining a final target window by taking the N +1 th target center as the center.
In the embodiment of the invention, N characteristic maps of the previous frame image and the current frame image are respectively obtained, the classification operation is carried out on every two characteristic maps in sequence, and the obtained response maps are subjected to weighted summation to obtain the fused response map, so that each characteristic map can be subjected to multiple iterative operations and the result is applied to the estimation of the target position, the target tracking accuracy is favorably optimized, and the estimation result accuracy is improved.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a target position estimation apparatus according to an embodiment of the present invention. The target position estimating apparatus may include:
an obtaining module 601, adapted to obtain N feature maps obtained by calculating a previous frame of image by N convolutional layers in a convolutional neural network, obtain N feature maps obtained by calculating a current frame of image by N convolutional layers in the convolutional neural network, where the size of each feature map is consistent and includes a plurality of response maps, each response map includes a response value and its coordinates, where N is less than or equal to the number of convolutional layers in the convolutional neural network and is a positive integer, and the N feature maps of each frame of image are arranged in a reverse order according to the layer numbers of the convolutional layers generating the feature maps;
a first map determining module 602, adapted to respectively crop feature maps of a preset size with a coordinate of a target center point of a previous frame image as a center in the first feature map of the previous frame image and the first feature map of the current frame image, and obtain a first response map by using a classification operation, where the first response map includes multiple response maps, and the first feature map is a feature map generated by calculating a last convolution layer;
a first center determining module 603 adapted to determine coordinates of a response map having a largest response value in the first response map as a first target center;
a first candidate map determining module 604, adapted to respectively cut feature maps of a preset size from the first feature map of the previous frame image and the first feature map of the current frame image with the first target center as a center, and obtain a first candidate response map by using a classification algorithm, where the first candidate response map includes multiple response maps;
a second map determining module 605, adapted to respectively cut feature maps of a preset size from the second feature map of the previous frame image and the second feature map of the current frame image with the first target center as a center, and obtain a second response map by using a classification algorithm, where the second response map includes multiple response maps;
a first fused map determining module 606, adapted to perform weighted summation on the response maps of the first candidate response map and the second response map by using a first preset weight value to obtain a second fused response map;
a second center determining module 607 adapted to determine coordinates of a response map having a largest response value in the second fused response map as a second target center;
an nth candidate map determining module 608 adapted to calculate candidate response maps corresponding to each feature map for the second to nth feature maps of the previous frame image, respectively, wherein for the nth feature map of the previous frame image and the nth feature map of the current frame image, feature maps of a preset size are respectively clipped with an nth target center as a center, and an nth candidate response map is obtained by using a classification algorithm, where the nth candidate response map includes a plurality of response maps, where 1 < N and N is a positive integer;
an N +1 response map determining module 609, adapted to calculate a response map corresponding to each feature map for the second to N-th feature maps of the previous frame image, respectively, wherein for the N + 1-th feature map of the previous frame image and the N + 1-th feature map of the current frame image, the feature maps with preset sizes are respectively clipped with the nth target center as a center, and an N + 1-th response map is obtained by using a classification algorithm, where the N + 1-th response map includes multiple response maps;
an nth fusion map determining module 610, adapted to calculate fusion response maps corresponding to each feature map for the second to nth feature maps of the previous frame image, respectively, wherein an nth preset weight value is adopted to perform weighted summation on the response maps of the nth candidate response map and the N +1 th response map to obtain an N +1 th fusion response map;
an n +1 center determining module 611 adapted to determine coordinates of a response map in the n +1 th fused response map having a largest response value as an n +1 th target center;
the target position determining module 612 is adapted to determine that the N +1 th target center is the target position in the current frame image when N +1 is equal to N.
Further, the first map determining module 602 may include: a first target image clipping sub-module (not shown) adapted to clip a first target feature image of a preset size using a first target window of a preset size in the first feature image of the previous frame image, with the coordinate of the target center point in the previous frame image as the center; a first search map clipping sub-module (not shown) adapted to clip a first search feature map of a preset size using a first search window of a preset size in the first feature map of the current frame image and centering on a coordinate where a target center point in a previous frame image is located; and a classification operation sub-module (not shown) adapted to input the first target feature map and the first search feature map into a classifier respectively for performing the classification operation, so as to obtain a first response map.
For the principle, specific implementation and beneficial effects of the target position estimation apparatus, please refer to the related description of the target position estimation method shown in fig. 1 to 5 and the foregoing, which is not repeated herein.
Embodiments of the present invention also provide a storage medium having stored thereon computer instructions, which when executed perform the steps of the method for estimating a target position shown in fig. 1 to 5. The storage medium may be a computer-readable storage medium, and may include, for example, a non-volatile (non-volatile) or non-transitory (non-transitory) memory, and may further include an optical disc, a mechanical hard disk, a solid state hard disk, and the like.
An embodiment of the present invention further provides a terminal, including a memory and a processor, where the memory stores computer instructions capable of being executed on the processor, and the processor executes the computer instructions to perform the steps of the target position estimation method shown in fig. 1 to 5. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A method of estimating a target position, comprising the steps of:
acquiring N feature maps obtained by calculating a previous frame of image through N convolutional layers in a convolutional neural network, acquiring N feature maps obtained by calculating a current frame of image through the N convolutional layers in the convolutional neural network, wherein the size of each feature map is consistent and comprises a plurality of response maps, each response map comprises a response value and a coordinate thereof, N is less than or equal to the number of the convolutional layers in the convolutional neural network and is a positive integer, and the N feature maps of each frame of image are arranged in a reverse order according to the layer number of the convolutional layer for generating each feature map;
respectively cutting feature maps with preset sizes by taking coordinates where a target center point of a previous frame image is located as a center in the first feature map of the previous frame image and the first feature map of the current frame image, and obtaining a first response mapping map by adopting classification operation, wherein the first response mapping map comprises a plurality of response mappings, and the first feature map is a feature map generated by calculating the last convolution layer;
determining coordinates of a response map with a maximum response value in the first response map as a first target center;
respectively cutting feature maps with preset sizes in the first feature map of the previous frame image and the first feature map of the current frame image by taking the first target center as the center, and obtaining a first candidate response map by adopting a classification algorithm, wherein the first candidate response map comprises a plurality of response maps;
respectively cutting feature maps with preset sizes in the second feature map of the previous frame image and the second feature map of the current frame image by taking the first target center as the center, and obtaining a second response map by adopting a classification algorithm, wherein the second response map comprises a plurality of response maps;
weighting and summing the response maps of the first candidate response map and the second response map by adopting a first preset weight value to obtain a second fusion response map;
determining the coordinate of the response mapping with the maximum response value in the second fusion response mapping map as a second target center;
respectively calculating to obtain candidate response mapping maps corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the feature maps with preset sizes are respectively cut out by taking the nth target center as the center for the nth feature map of the previous frame image and the nth feature map of the current frame image, and a classification algorithm is adopted to obtain an nth candidate response mapping map, wherein the nth candidate response mapping map comprises a plurality of response mappings, and N is a positive integer and is more than 1 and less than N;
respectively calculating a response mapping map corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the feature maps with preset sizes are respectively cut out by taking the nth target center as the center for the (N + 1) th feature map of the previous frame image and the (N + 1) th feature map of the current frame image, and a classification algorithm is adopted to obtain an (N + 1) th response mapping map, wherein the (N + 1) th response mapping map comprises a plurality of response mappings;
respectively calculating a fusion response mapping map corresponding to each feature map for the second to nth feature maps of the previous frame image, wherein the nth candidate response mapping map and the response mapping of the (N + 1) th response mapping map are weighted and summed by adopting an nth preset weight value to obtain an (N + 1) th fusion response mapping map;
determining the coordinates of the response mapping in the n +1 th fusion response mapping chart with the maximum response value as the n +1 th target center;
and when N +1 is equal to N, determining the center of the (N + 1) th target as the target position in the current frame image.
2. The method of claim 1, wherein the obtaining of the N feature maps obtained by computing the previous frame of image by the N convolutional layers in the convolutional neural network, and the obtaining of the N feature maps obtained by computing the current frame of image by the N convolutional layers in the convolutional neural network comprises:
respectively obtaining a characteristic diagram obtained after a previous frame of image passes through N convolutional layers in a convolutional neural network;
respectively obtaining a characteristic diagram obtained after the current frame image passes through N convolutional layers in a convolutional neural network;
and respectively scaling the N characteristic graphs of the previous frame image and the N characteristic graphs of the current frame image to preset characteristic graph sizes.
3. The object position estimation method according to claim 2, wherein the N feature maps of the previous frame image and the N feature maps of the current frame image are scaled to a preset feature map size, respectively, using a bilinear interpolation method or a trilinear interpolation method.
4. The method of claim 1, wherein the step of respectively cutting out feature maps of a preset size from the coordinates of the center point of the target in the previous frame image as the center in the first feature map of the previous frame image and the first feature map of the current frame image, and obtaining the first response map by using a classification operation comprises:
in the first feature map of the previous frame image, a first target window with a preset size is adopted, and a first target feature map with a preset size is cut by taking a coordinate where a target center point in the previous frame image is located as a center;
in the first feature map of the current frame image, cutting a first search feature map with a preset size by adopting a first search window with a preset size and taking a coordinate where a target center point in a previous frame image is located as a center;
and respectively inputting the first target feature map and the first search feature map into a classifier for classification operation to obtain a first response mapping map.
5. The target position estimation method according to claim 1, wherein the algorithm of the classification operation includes: KCF algorithm, ADABOOST algorithm, and SVM algorithm.
6. The target position estimation method according to claim 1, wherein the convolutional neural network comprises: AlexNet, VGGNet, and GoogleNet.
7. A target position estimating apparatus, characterized by comprising:
the acquisition module is suitable for acquiring N characteristic maps obtained by calculating the previous frame of image through N convolutional layers in a convolutional neural network, acquiring N characteristic maps obtained by calculating the current frame of image through the N convolutional layers in the convolutional neural network, wherein the size of each characteristic map is consistent and comprises a plurality of response maps, each response map comprises a response value and a coordinate thereof, N is less than or equal to the number of the convolutional layers in the convolutional neural network and is a positive integer, and the N characteristic maps of each frame of image are arranged in a reverse order according to the layer numbers of the convolutional layers generating the characteristic maps;
the first map determining module is suitable for respectively cutting feature maps with preset sizes by taking the coordinate where the target center point of the previous frame image is located as the center in the first feature map of the previous frame image and the first feature map of the current frame image, and obtaining a first response map by adopting classification operation, wherein the first response map comprises a plurality of response maps, and the first feature map is a feature map generated by calculating the last convolution layer;
a first center determining module adapted to determine coordinates of a response map having a largest response value in the first response map as a first target center;
the first candidate map determining module is suitable for respectively cutting feature maps with preset sizes in the first feature map of the previous frame image and the first feature map of the current frame image by taking the first target center as a center, and obtaining a first candidate response map by adopting a classification algorithm, wherein the first candidate response map comprises a plurality of response maps;
the second mapping map determining module is suitable for respectively cutting feature maps with preset sizes in a second feature map of the previous frame image and a second feature map of the current frame image by taking the first target center as a center, and obtaining a second response mapping map by adopting a classification algorithm, wherein the second response mapping map comprises a plurality of response mappings;
the first fusion map determining module is suitable for weighting and summing response maps of the first candidate response map and the second response map by adopting a first preset weight value to obtain a second fusion response map;
a second center determination module adapted to determine coordinates of a response map having a maximum response value in the second fused response map as a second target center;
an nth candidate map determining module, adapted to calculate candidate response maps corresponding to each feature map for the second to nth feature maps of the previous frame image, respectively, wherein for the nth feature map of the previous frame image and the nth feature map of the current frame image, feature maps of a preset size are respectively clipped with an nth target center as a center, and an nth candidate response map is obtained by using a classification algorithm, and the nth candidate response map includes a plurality of response maps, where 1 < N and N is a positive integer;
the N +1 response map determining module is suitable for respectively calculating the second to the nth feature maps of the previous frame image to obtain a response mapping map corresponding to each feature map, wherein the N +1 feature map of the previous frame image and the N +1 feature map of the current frame image are respectively cut out by taking the nth target center as the center, and a classification algorithm is adopted to obtain the N +1 response mapping map, wherein the N +1 response mapping map comprises a plurality of response mappings;
an nth fusion map determining module, adapted to calculate fusion response maps corresponding to the second to nth feature maps of the previous frame image, respectively, wherein an nth preset weight value is adopted to perform weighted summation on the nth candidate response map and the response map of the (N + 1) th response map to obtain an (N + 1) th fusion response map;
an n +1 center determination module adapted to determine coordinates of a response map in the n +1 th fused response map having a largest response value as an n +1 th target center;
and the target position determining module is suitable for determining the center of the (N + 1) th target as the target position in the current frame image when N +1 is equal to N.
8. The target position estimation device of claim 7, wherein the first map determination module comprises:
the first target image cutting sub-module is suitable for cutting a first target feature image with a preset size by adopting a first target window with a preset size in the first feature image of the previous frame image and taking the coordinate where the target center point in the previous frame image is located as the center;
the first search map cutting sub-module is suitable for cutting a first search feature map with a preset size by adopting a first search window with a preset size in the first feature map of the current frame image and taking the coordinate where the target center point in the previous frame image is located as the center;
and the classification operation submodule is suitable for respectively inputting the first target characteristic diagram and the first search characteristic diagram into a classifier to carry out classification operation so as to obtain a first response mapping diagram.
9. A storage medium having stored thereon computer instructions which, when executed, perform the steps of the method of estimating a position of an object according to any one of claims 1 to 6.
10. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the target position estimation method of any one of claims 1 to 6.
CN201910038152.3A 2019-01-15 2019-01-15 Target position estimation method and device, storage medium and terminal Active CN111291745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910038152.3A CN111291745B (en) 2019-01-15 2019-01-15 Target position estimation method and device, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910038152.3A CN111291745B (en) 2019-01-15 2019-01-15 Target position estimation method and device, storage medium and terminal

Publications (2)

Publication Number Publication Date
CN111291745A true CN111291745A (en) 2020-06-16
CN111291745B CN111291745B (en) 2022-06-14

Family

ID=71024080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910038152.3A Active CN111291745B (en) 2019-01-15 2019-01-15 Target position estimation method and device, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN111291745B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989368A (en) * 2015-02-13 2016-10-05 展讯通信(天津)有限公司 Target detection method and apparatus, and mobile terminal
CN107016689A (en) * 2017-02-04 2017-08-04 中国人民解放军理工大学 A kind of correlation filtering of dimension self-adaption liquidates method for tracking target
CN107203765A (en) * 2017-03-30 2017-09-26 腾讯科技(上海)有限公司 Sensitive Image Detection Method and device
CN108550126A (en) * 2018-04-18 2018-09-18 长沙理工大学 A kind of adaptive correlation filter method for tracking target and system
CN108665481A (en) * 2018-03-27 2018-10-16 西安电子科技大学 Multilayer depth characteristic fusion it is adaptive resist block infrared object tracking method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989368A (en) * 2015-02-13 2016-10-05 展讯通信(天津)有限公司 Target detection method and apparatus, and mobile terminal
CN107016689A (en) * 2017-02-04 2017-08-04 中国人民解放军理工大学 A kind of correlation filtering of dimension self-adaption liquidates method for tracking target
CN107203765A (en) * 2017-03-30 2017-09-26 腾讯科技(上海)有限公司 Sensitive Image Detection Method and device
CN108665481A (en) * 2018-03-27 2018-10-16 西安电子科技大学 Multilayer depth characteristic fusion it is adaptive resist block infrared object tracking method
CN108550126A (en) * 2018-04-18 2018-09-18 长沙理工大学 A kind of adaptive correlation filter method for tracking target and system

Also Published As

Publication number Publication date
CN111291745B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN112069896B (en) Video target tracking method based on twin network fusion multi-template features
CN109086811B (en) Multi-label image classification method and device and electronic equipment
Bousetouane et al. Improved mean shift integrating texture and color features for robust real time object tracking
CN111260688A (en) Twin double-path target tracking method
Huang et al. Joint blur kernel estimation and CNN for blind image restoration
KR20180105876A (en) Method for tracking image in real time considering both color and shape at the same time and apparatus therefor
Singh et al. A novel approach to combine features for salient object detection using constrained particle swarm optimization
CN111709295A (en) SSD-MobileNet-based real-time gesture detection and recognition method and system
Li et al. Robust scale adaptive kernel correlation filter tracker with hierarchical convolutional features
CN110765860A (en) Tumble determination method, tumble determination device, computer apparatus, and storage medium
CN109166139B (en) Scale self-adaptive target tracking method combined with rapid background suppression
JP6756406B2 (en) Image processing equipment, image processing method and image processing program
CN111429481B (en) Target tracking method, device and terminal based on adaptive expression
CN114399808A (en) Face age estimation method and system, electronic equipment and storage medium
Sui et al. Exploiting the anisotropy of correlation filter learning for visual tracking
WO2020194792A1 (en) Search device, learning device, search method, learning method, and program
CN115063526A (en) Three-dimensional reconstruction method and system of two-dimensional image, terminal device and storage medium
CN116563285B (en) Focus characteristic identifying and dividing method and system based on full neural network
CN111105436A (en) Target tracking method, computer device, and storage medium
CN111291745B (en) Target position estimation method and device, storage medium and terminal
CN114842506A (en) Human body posture estimation method and system
CN113807354B (en) Image semantic segmentation method, device, equipment and storage medium
CN113688655B (en) Method, device, computer equipment and storage medium for identifying interference signals
CN113963236A (en) Target detection method and device
CN110490235B (en) Vehicle object viewpoint prediction and three-dimensional model recovery method and device facing 2D image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant