WO2019057197A1 - 运动目标的视觉跟踪方法、装置、电子设备及存储介质 - Google Patents

运动目标的视觉跟踪方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2019057197A1
WO2019057197A1 PCT/CN2018/107289 CN2018107289W WO2019057197A1 WO 2019057197 A1 WO2019057197 A1 WO 2019057197A1 CN 2018107289 W CN2018107289 W CN 2018107289W WO 2019057197 A1 WO2019057197 A1 WO 2019057197A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
feature
moving target
information
moving
Prior art date
Application number
PCT/CN2018/107289
Other languages
English (en)
French (fr)
Inventor
梅元刚
刘鹏
陈宇
王明琛
朱政
Original Assignee
北京金山云网络技术有限公司
北京金山云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山云网络技术有限公司, 北京金山云科技有限公司 filed Critical 北京金山云网络技术有限公司
Publication of WO2019057197A1 publication Critical patent/WO2019057197A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/292Multi-camera tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches

Definitions

  • the present application relates to the field of image processing technologies, and in particular, to a visual tracking method, device, electronic device, and storage medium for a moving object.
  • visual tracking refers to detecting, extracting features, identifying, locating, and filtering moving objects in a video sequence, and obtaining motion parameters of moving objects, such as position, velocity, and motion trajectory.
  • Visual tracking technology is one of the hot research directions in the field of computer vision. It has a wide range of applications in the fields of video surveillance, robot positioning, and environmental awareness. It can further enhance the behavior of the tracking target through analysis and analysis. The mission provides the necessary technical means.
  • Visual tracking technology has received extensive attention and research, and it has developed rapidly. Some mature algorithms have emerged, such as local information-based tracking algorithm. In this algorithm, the initial region of the target is used as the target template, and the target template and image are used. Template matching is performed for all areas, and the highest matching position is used as the target position. Commonly used methods are Lucas-Kanade optical flow tracking algorithm. This algorithm uses the global information of the target and has high credibility. However, when the target is deformed or occluded, the related technology is easier to track the failure, and generally requires sliding window matching, which requires a large amount of calculation, resulting in low tracking efficiency. , real-time is poor.
  • the purpose of the embodiments of the present application is to provide a visual tracking method, device, electronic device and storage medium for moving targets, so as to improve the real-time performance of the tracking algorithm.
  • the specific technical solutions are as follows: Content
  • the embodiment of the present application provides a moving target visual tracking method, the method comprising: acquiring position information of a moving target to be tracked in a first video frame, and extracting the moving target according to the position information. a first feature in the first video frame; acquiring acceleration information and angular velocity information of the moving target when acquiring the second video frame, wherein the second video frame is a next video frame of the first video frame; Calculating the first position of the moving target in the second video frame, and extracting the first position, the acceleration information and the angular velocity information, and position information of the moving target in the first video frame a second feature of the moving object at the first position in the two video frames; matching the first feature and the second feature to obtain a matching feature; determining, by using an optical flow algorithm, the matching feature The moving target is in a second position in the second video frame.
  • the embodiment of the present application provides a moving object visual tracking device, where the device includes: a first extracting module configured to acquire position information of a moving target to be tracked in a first video frame, and according to the position information, Extracting a first feature of the moving target in the first video frame; and acquiring, configured to acquire acceleration information and angular velocity information of the moving target when acquiring the second video frame; wherein the second video frame a next video frame of the first video frame; a second extraction module configured to calculate, according to the acceleration information and the angular velocity information, and location information of the moving target in the first video frame
  • the moving target is at a first position in the second video frame, and extracts a second feature of the moving target at the first position in the second video frame;
  • a matching module is configured to Matching the first feature and the second feature to obtain a matching feature;
  • the first calculating module is configured to determine the moving target according to the matching feature by using an optical flow algorithm The second video frame in a second position.
  • An embodiment of the present application provides an electronic device including a processor and a machine readable storage medium storing machine executable instructions executable by the processor, the processor executing the When the machine executes the instructions, the method steps as described above are implemented.
  • the embodiment of the present application provides a computer readable storage medium having a computer program stored therein, the computer program being executed by a processor to implement the method steps as described above.
  • the embodiment of the present application further provides a computer program product comprising instructions, which when executed on a computer, cause the computer to perform a moving object visual tracking method as described above.
  • the embodiment of the present application also provides a computer program that, when run on a computer, causes the computer to perform a moving object visual tracking method as described above.
  • the first position in the two video frames that is, by introducing the acceleration information and the angular velocity information, can dynamically determine the first position, that is, dynamically change the search range of the moving target, which is compared to using the entire frame image as a moving target.
  • the search range reduces the amount of calculation; in addition, in this embodiment, the sliding window method is not needed to match, the calculation amount is reduced, and the calculation efficiency is improved, and the application improves the real-time performance of the tracking algorithm and improves the tracking efficiency.
  • implementing any of the products or methods of the present application necessarily does not necessarily require all of the advantages described above to be achieved at the same time.
  • FIG. 1 is a flowchart of a method for visual tracking of a moving object according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for visual tracking of a moving object according to another embodiment of the present application
  • FIG. 3 is a flowchart of a method for visual tracking of a moving object according to another embodiment of the present application.
  • FIG. 4 is a flowchart of a method for visual tracking of a moving object according to another embodiment of the present application.
  • FIG. 5 is a flowchart of a method for visual tracking of a moving object according to still another embodiment of the present application.
  • FIG. 5b is a flowchart of a moving target visual tracking method according to still another embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a moving object visual tracking device according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a moving object visual tracking device according to another embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a moving object visual tracking device according to still another embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a moving object visual tracking device according to still another embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a moving object visual tracking device according to still another embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the local information-based tracking algorithm in the related art is relatively easy to track failure when the target is deformed or occluded, and generally requires sliding window matching, and the calculation amount is large, resulting in low tracking efficiency and poor real-time performance.
  • the embodiment of the present application provides a method, a device, an electronic device and a storage medium for visually tracking a moving target, which are respectively described in detail below.
  • FIG. 1 is a flowchart of a method for visually tracking a moving object according to an embodiment of the present application, including the following steps:
  • Step 101 Determine a moving target to be tracked in the first video frame, determine position information of the moving target in the first video frame, and extract a first feature of the moving target in the first video frame.
  • the visual tracking method provided by the embodiment of the present application can be applied to an electronic device such as a portable notebook, a desktop computer, or a smart phone.
  • the input received by the processor of the electronic device may be a plurality of video frames, and the plurality of video frames may be a group of video frames that are temporally adjacent and captured by the same moving target; the plurality of video frames may also be intelligent
  • the mobile phone or the like is obtained by real-time shooting; the plurality of video frames can also be obtained from the library of the electronic device.
  • the first video frame may be any one of a plurality of video frames received by the electronic device.
  • Step 101 may be performed by: acquiring location information of a moving target to be tracked in a first video frame, and extracting, according to the location information, a first feature of the moving target in the first video frame.
  • the manner of acquiring the position information of the moving target in the first video frame may be various: for example, if the first video frame is the first frame in a video, the user may input the moving target to be tracked. Position information; if the first video frame is not the first frame in the video, the moving target may be tracked by using the scheme, and the moving target is obtained in the first video frame according to the tracking result corresponding to the previous frame of the first video frame. Location information in, but not limited to.
  • the position information of the moving target may include: a spatial coordinate of the moving target in the first video frame and an angle between the moving target and the horizontal plane in the first video frame, and the like. It should be noted that, when the location information is obtained, the moving target to be tracked is found in the first video frame, and the feature of the target at the location information is extracted, that is, the feature of the moving target to be tracked is extracted.
  • the processor may determine, according to the acquired location information, a rectangular area where the moving target is located in the first video frame, where the rectangular area may be a rectangular frame with a side length L, and the motion to be tracked The target is inside the border.
  • the moving target to be tracked is called the foreground
  • the foreground can also be understood as the content in the rectangular frame where the moving target is located, but is not limited thereto, and the remaining part of the frame is called the background.
  • Detecting all feature points and feature operators in the first video frame separating the foreground and the background in the first video frame according to the detected feature points and feature operators, and extracting features of the objects in the foreground, that is, moving targets
  • the feature can directly read the first video frame through OpenCV, and realize the feature of extracting the moving target, and as the first feature.
  • Extracting the features of the moving object may include extracting a color feature, a texture feature, and a shape feature of the moving target, and the moving feature may further include a moving feature for the moving target.
  • the color feature is a global feature, which is a feature based on pixel points. Since the color is insensitive to changes in the direction and size of the image or image region, the color feature does not represent the feature of the moving target in the image well.
  • the color histogram is the most commonly used method for describing color features. Its advantage is that it is not affected by image rotation and translation changes. Further, it can be affected by image scale changes by means of normalization. The disadvantage is that no information about the color space distribution is expressed. . Color features can also be described by color sets, color moments, and color correlation diagrams.
  • a texture feature is also a global feature that also describes the surface properties of a scene corresponding to an image or image region.
  • texture features are not pixel-based features, and they require statistical calculations in regions that contain multiple pixels. In pattern matching, this regional feature has greater advantages and cannot be successfully matched due to local deviations.
  • texture features often have rotational invariance and are highly resistant to noise.
  • texture features also have disadvantages. When the resolution of an image changes, the calculated texture may have a large deviation.
  • the description of the texture features can be performed by statistical methods, geometric methods, model methods, and signal processing methods.
  • the shape feature is characterized by only describing the local properties of the moving target. To fully describe the moving target requires high computational time and storage capacity; many shape features reflect the moving target shape information and the human intuitive feeling is not completely consistent. Or, the similarity of the feature space is different from the similarity perceived by the human visual system.
  • the shape feature can be described by a boundary feature method, a Fourier shape descriptor method, a geometric parameter method, or the like.
  • Step 102 Acquire acceleration information and angular velocity information of the moving target in the second video frame.
  • Step 102 can be expressed as: acquiring acceleration information and angular velocity information of the moving target when the second video frame is acquired.
  • the second video frame may be obtained by real-time shooting or directly from the local library, where the second video frame is the next video frame of the first video frame.
  • the first video frame and the second video frame are temporally adjacent video frames.
  • a method for acquiring acceleration information and angular velocity information of a moving target when acquiring a second video frame is: acquiring acceleration information of a moving target when acquiring a second video frame by using an acceleration sensor; acquiring a moving target when acquiring the second video frame by using a gyro sensor Angular velocity information.
  • Accelerometers and gyroscope sensors can be pre-installed in electronic devices.
  • a Micro Electro Mechanical Systems (MEMS) gyroscope can be installed in the mobile phone to measure the Coriolis acceleration generated by the rotation. Obtain angular velocity, install an accelerometer, and obtain acceleration information by measuring acceleration.
  • MEMS Micro Electro Mechanical Systems
  • continuous shooting is performed in the moving direction of the moving target, so that the smart phone and the moving target remain relatively stationary, and the acceleration information and angular velocity information obtained by the acceleration sensor and the gyro sensor are the motion of the smart phone.
  • the information can further determine the motion information of the moving target, and the gravity sensor, the direction sensor, the attitude sensor, etc. can be pre-installed in the electronic device to obtain the acceleration information of the moving target when acquiring the second video frame by acquiring the motion information of the electronic device.
  • Angular velocity information can be pre-installed in the electronic device to obtain the acceleration information of the moving target when acquiring the second video frame by acquiring the motion
  • Step 103 Calculate a position of the moving target in the second video frame according to the acceleration information and the angular velocity information, and the position information of the moving target in the first video frame, and extract the second moving target at the position in the second video frame. feature.
  • the position of the moving target in the second video frame may be calculated according to the acceleration information and the angular velocity information, and the position information of the moving target in the first video frame.
  • the acceleration information and the angular velocity information of the moving target when the second video frame is acquired reflect the position and posture change of the moving target when the second video frame is acquired, and the position of the moving target in the second video frame can be obtained according to the position and posture change.
  • the position of the moving object in the second video frame is referred to as a first position, which is a candidate position that roughly estimates the moving target in the second video frame.
  • L is the side length of the moving target frame in the first video frame
  • L t is the side length of the moving target frame in the second video frame
  • is a common parameter.
  • s is a target scaling coefficient of a moving target to be tracked by the first video frame
  • ⁇ a t is a weighted sum of acceleration information and angular velocity information of the moving target.
  • the frame size of the moving target is dynamically adjusted, and feature extraction is performed on the image area in the frame, that is, the size of the area for feature extraction is dynamically adjusted, so that compared to the fixed area Feature extraction is performed with better results. It can be understood that if the fixed area of the set area is large, the calculation amount of the feature extraction is large. If the fixed area of the fixed area is small, the feature may be missed, and the solution dynamically changes the size of the area according to the actual situation. Adjustments have solved both problems.
  • the boundary of the moving target can also be understood as the search range of the moving target, and the search range is also the range in which the moving target is searched in the image.
  • the search range of the moving target can be dynamically adjusted.
  • the search range of the moving target (the size of the motion frame) can be dynamically changed, and the calculation efficiency can be improved, and the electronic device can be appropriately increased when the electronic device moves relatively large.
  • the search range of the moving target, to avoid the search range is too small, resulting in tracking failure.
  • the processor can directly read the moving target to be tracked in the second video frame through the OpenCV, and extract the feature of the moving target at the first position, that is, the second A second feature of the moving target in the video frame.
  • the method for extracting the second feature is the same as step 101, and includes extracting a color feature, a texture feature, a shape feature, and the like of the moving target, and may further include extracting the motion feature for the moving target. It should be noted that, in this step, the extracted second feature and the extracted first feature are consistent. If the color feature of the moving target is extracted in step 101, when the color histogram is used to describe the color feature, then the extraction is performed.
  • the second feature is also a color feature; when the texture feature and the shape feature are used to describe the feature of the moving object, it needs to be consistent with the first feature.
  • the reason for keeping the second feature consistent with the first feature is to facilitate the subsequent matching for features extracted under the same standard.
  • Step 104 Match the first feature and the second feature to obtain a matching feature.
  • the first feature and the second feature respectively obtained in step 101 and step 103 are matched, and the matching method may be determined according to different features extracted.
  • color features can be performed by a histogram intersection method, a distance method, a center distance method, a reference color table method, an accumulated color histogram method, and the like.
  • Matching when extracting texture features of moving targets, texture features can be matched by gray level co-occurrence matrix, wavelet transform, and the like.
  • shape feature of the moving object it can be matched by the shape feature based on the wavelet and the relative moment. After the first feature and the second feature are matched, a matching feature is obtained, the matching feature representing a similar feature in the first feature and the second feature.
  • the feature A in the first feature is similar to the feature A′ in the second feature
  • the feature A may be used as the matching feature, or A′ may be used as the matching feature, and the feature A and the feature A′ may also be used.
  • the feature A and the feature A' may be fused to obtain a matching feature, but is not limited thereto.
  • Step 105 The matching feature is obtained by using an optical flow algorithm to obtain location feature information of the first feature in the second video frame.
  • the second position of the moving object in the second video frame is determined according to the matching feature.
  • the matching feature can be used as an input of the optical flow algorithm to obtain the position of the moving target in the second video frame, and the optical flow algorithm is generally applied to track the features in consecutive frames of the video.
  • the relationship between the previous video frame and the current video frame is found by using the change of the pixel in the video frame sequence in the time domain and the correlation between adjacent frames to calculate the moving target between the adjacent video frames.
  • the first feature in the first video is corresponding to the location feature information in the second video frame, that is, the moving target is in the second video, by using the optical flow algorithm to perform similar features in the first feature and the second feature.
  • Applying the moving target visual tracking method provided by the embodiment of the present application acquiring position information of the moving target to be tracked in the first video frame, and extracting the first feature of the moving target in the first video frame according to the position information Acquiring acceleration information and angular velocity information of the moving target when acquiring the second video frame, wherein the second video frame is a next video frame of the first video frame; and the acceleration target and the angular velocity information, and the moving target Positioning information in the first video frame, calculating a first position of the moving target in the second video frame, and extracting a second feature of the moving target at the first position in the second video frame; The first feature and the second feature are matched to obtain a matching feature; and the second position of the moving target in the second video frame is determined according to the matching feature by an optical flow algorithm.
  • the first position of the moving target in the second video frame is calculated according to the acceleration information and the angular velocity information of the moving target and the position information of the moving target in the first video frame, that is, by introducing acceleration information and
  • the angular velocity information can dynamically determine the first position, that is, dynamically change the search range of the moving target, which reduces the amount of calculation compared to the search range in which the entire frame image is used as the moving target; in addition, in this embodiment, The sliding window method is not needed to match, the calculation amount is reduced, and the calculation efficiency is improved, and the application improves the real-time performance of the tracking algorithm and improves the tracking efficiency.
  • step 103 the motion target is calculated in the second video frame according to the acceleration information and the angular velocity information, and the position information of the moving target in the first video frame.
  • the method flow diagram of the location includes the following steps:
  • Step 1031 Calculate position change information of the moving target based on the acceleration information and the angular velocity information.
  • the processor may calculate the position change information of the moving target between the first video frame and the second video frame.
  • the motion acceleration of the moving target can be obtained according to the acceleration information
  • the angular velocity information obtains the moving direction and the rotating angle of the moving target.
  • the moving acceleration, the moving direction and the rotating angle of the moving target, the moving direction and the moving distance of the moving target can be obtained. That is, relative to the position change information in the first video frame.
  • Step 1032 Determine the position of the moving target in the second video frame according to the position information of the moving target in the first video frame and the position change information.
  • the position obtained in this step is referred to as a first position (which may correspond to the first position in the above embodiment), and the first position is a candidate position roughly estimated for the moving target in the second video frame.
  • the change of the motion after the first video frame of the moving target can be known.
  • a first location of the motion target in the second video frame can be obtained.
  • the position change of the moving target is obtained, and then combining the position information in the first video frame, the moving target can be obtained in the second video frame.
  • the first position by introducing the acceleration information and the angular velocity information, the first position can be dynamically determined, that is, the search range of the moving target is dynamically changed, which is improved compared to the search range in which the entire frame image is used as the moving target. The efficiency is calculated and the resulting first position is real-time.
  • a method for obtaining the location feature information of the first feature in the second video frame by using the optical flow algorithm in step 105 is as shown in FIG. 3, and includes the following steps:
  • step 1051 the matching feature is merged with the second feature to obtain a common feature of the matching feature and the second feature.
  • the obtained matching feature and the extracted second feature are combined to generate a new feature, and a common feature of the matching feature and the second feature is obtained.
  • the common features in this embodiment may correspond to the fusion features in the above embodiments.
  • step 1052 the common feature and the first feature are subjected to an optical flow algorithm to obtain location feature information of the first feature in the second video frame.
  • a second position of the moving object in the second video frame is determined based on the common feature and the first feature.
  • the "location feature information" is also the second location.
  • the optical flow algorithm uses the change of the pixels in the video frame sequence in the time domain and the correlation between adjacent frames to find the correspondence between the previous video frame and the current video frame, thereby calculating the adjacent video frame.
  • a method of moving information between moving targets Using the common feature and the first feature as inputs to the optical flow algorithm, a second position of the moving target in the second video frame can be calculated. In order to distinguish the description, the position obtained in this step is referred to as a second position, and the second position is more accurate than the first position.
  • the method for obtaining the location feature information of the first feature in the second video frame by using the optical flow algorithm in the common feature and the first feature in step 1052 As shown in Figure 4, the following steps are included:
  • Step 1052a calculating a scaling ratio and a rotation ratio of the common feature with respect to the first feature.
  • a matching feature is obtained, and the matching feature and the second feature are merged to generate a new feature, and the new feature is a common feature of the matching feature and the second feature, after which And calculating a scaling ratio and a rotation ratio of the common feature with respect to the first feature, respectively. Calculating a relative distance and a relative angle between the common feature and the first feature, comparing the common feature with the first feature, and calculating a scaling ratio of the common feature with respect to the first feature.
  • the method provided provides a larger search range for the moving target, and the larger the magnification ratio of the common feature relative to the first feature.
  • the weighted sum of the obtained acceleration information and the angular velocity information is smaller, the smaller the side length, the smaller the frame of the moving target is.
  • the method provided by the embodiment of the present application has a smaller search range for the moving target, and the common feature is relatively smaller. The smaller the reduction ratio of the first feature is; the rotation ratio can be obtained by the same calculation method.
  • the border of the moving target can also be understood as the search range of the moving target, and the search range is also the range in which the moving target is searched for in the image; in addition, the border of the moving target can also be understood as the area to which the feature extraction is directed.
  • Step 1052b The common feature, the scaling ratio, and the rotation ratio are obtained by using a preset tracking algorithm to obtain location feature information of the first feature in the second video frame.
  • the second position of the moving object in the second video frame is determined according to the common feature, the scaling ratio and the rotation ratio by a preset tracking algorithm.
  • the position obtained in this step is referred to as a second position
  • the above position feature information is also the second position
  • the second position is more accurate than the first position.
  • the preset tracking algorithm may be a CMT algorithm, a local feature extraction algorithm, or the like, and is not specifically limited.
  • the scaling, rotation ratio and common features obtained in step 1052a are processed by the preset tracking algorithm, and each data feature point obtained in the scaling ratio, the rotation ratio and the common feature is obtained, and the first is obtained according to the data feature point. Location feature information of the feature in the second video frame.
  • the method for obtaining the matching feature after the location feature information of the first feature in the second video frame by using the optical flow algorithm in step 105 is as shown in FIG. 5a, and further includes the following steps:
  • step 106 the common feature, the scaling ratio, and the rotation ratio are voted to generate a voting space.
  • Step 106 can be expressed as: calculating a voting value of each feature point according to the common feature, the scaling ratio, and the rotation ratio, selecting a feature point whose voting value satisfies the condition, and the selected feature point constitutes a voting space.
  • the satisfaction condition may be that the voting value is greater than a preset threshold.
  • the relative distance of these feature points from the center should be relatively constant, that is, the position of the feature point of the next frame relative to the center is constant.
  • the relative distance of these feature points from the center should be relatively constant, that is, the position of the feature point of the next frame relative to the center is constant.
  • the characteristic intensity and the precise positioning are composed of a plurality of feature points to form a feature vector, which is a voting space.
  • step 107 the voting space is clustered.
  • Clustering the generated voting space, wherein clustering is a data analysis method, which can aggregate feature points with large dependence in the voting space, and the characteristics of the voting space in the clustered space are larger.
  • the point composition, the clustered voting space is a part of the voting space in step 106, and the clustered voting space is a sub-vector of the feature vector composed of the plurality of feature points.
  • step 108 the length of the polled space after clustering is counted.
  • the clustered voting space is a feature sub-vector composed of feature points with large dependence.
  • the length of the feature sub-vector is calculated, and the obtained length value is the length of the voting space after clustering.
  • the method provided by the embodiment of the present application may further include, as shown in FIG. 5b:
  • Step 109 When the length of the clustered voting space is greater than a preset threshold, perform Kalman filtering on the location feature information to obtain location information of the moving target in the second video frame.
  • the position information obtained in this step is referred to as a third position, and the second position (position feature information) is subjected to filtering processing to obtain a third position, which is more accurate than the second position.
  • R is the initial noise covariance
  • ⁇ a t is the change information of the current video frame obtained by the sensor relative to the previous video frame
  • R t is the noise covariance of the current video frame.
  • the length of the polled space after the clustering in step 108 is greater than a preset threshold, that is, there are more feature points with larger dependencies, and the obtained features are relatively matched.
  • the parameters of the latest rectangular frame are calculated.
  • the position information of the moving target in the frame is the position feature information of the moving target to be tracked, that is, the preliminary tracking result, but the preliminary tracking result contains noise, and the position feature information needs to be Kalman filtered to obtain stable tracking.
  • the length is less than the preset threshold, that is, the rectangular frame is too small to frame the moving target, the tracking fails.
  • the related visual tracking technology also includes a feature point based tracking algorithm.
  • the feature point based tracking algorithm only considers the salient features of the target, so it can realize tracking under the condition of partial occlusion and deformation.
  • the Clustering of Static-Adaptive Correspondences for Deformable Object Tracking (CMT) algorithm is a feature point-based tracking algorithm that can track any object with significant features.
  • the CMT algorithm will obtain the features and the features obtained by matching the feature operator by calculating the forward and backward optical flows between the image frames before and after, and use the clustering method to obtain consistent and robust features.
  • the CMT algorithm calculates the relative position of the feature points at the center of the frame. For the non-deformed target, the distance of the feature relative to the center is constant under the scaling, so the algorithm can track the target of the rotation.
  • the feature point-based tracking algorithm can obtain corresponding matching feature points, and estimate the position and attitude of the target by least squares method, and can adapt to certain occlusion and deformation, but the CMT algorithm has good tracking performance and algorithm efficiency. High, but for mobile devices that require computational efficiency and power consumption, the tracking requirements cannot be fully met, and the CMT algorithm requires high accuracy of feature points.
  • the feature points extracted in practical applications usually have a small range. Errors, the stability of the scheme is poor, and it is difficult to meet the target tracking applications that need to be stable, such as Augmented Reality (AR) textures.
  • AR Augmented Reality
  • Kalman filtering is performed on the second position to obtain a third position of the moving target in the second video frame, which improves stability compared to the feature point based tracking algorithm.
  • the embodiment of the present application further provides a moving target visual tracking device.
  • the structure of the device is shown in FIG. 6 and includes:
  • the first extraction module 601 is configured to acquire position information of the moving target to be tracked in the first video frame, and extract a first feature of the moving target in the first video frame according to the position information;
  • the obtaining module 602 is configured to acquire acceleration information and angular velocity information of the moving target when the second video frame is acquired;
  • the second extraction module 603 is configured to calculate a first position of the moving target in the second video frame according to the acceleration information and the angular velocity information, and extract a second feature of the moving target at the first position in the second video frame;
  • the matching module 604 is configured to match the first feature and the second feature to obtain a matching feature
  • the first calculation module 605 is configured to determine, by the optical flow algorithm, a second position of the moving target in the second video frame according to the matching feature.
  • Applying the moving object visual tracking device provided by the embodiment of the present application, acquiring position information of the moving target to be tracked in the first video frame, and extracting the first feature of the moving target in the first video frame according to the position information Acquiring acceleration information and angular velocity information of the moving target when acquiring the second video frame, wherein the second video frame is a next video frame of the first video frame; and the acceleration target and the angular velocity information, and the moving target Positioning information in the first video frame, calculating a first position of the moving target in the second video frame, and extracting a second feature of the moving target at the first position in the second video frame; The first feature and the second feature are matched to obtain a matching feature; and the second position of the moving target in the second video frame is determined according to the matching feature by an optical flow algorithm.
  • the first position of the moving target in the second video frame is calculated according to the acceleration information and the angular velocity information of the moving target and the position information of the moving target in the first video frame, that is, by introducing acceleration information and
  • the angular velocity information can dynamically determine the first position, that is, dynamically change the search range of the moving target, which reduces the amount of calculation compared to the search range in which the entire frame image is used as the moving target; in addition, in this embodiment, The sliding window method is not needed to match, the calculation amount is reduced, and the calculation efficiency is improved, and the application improves the real-time performance of the tracking algorithm and improves the tracking efficiency.
  • the acquiring module 602 is specifically configured to acquire, by using an acceleration sensor, acceleration information of a moving target when acquiring the second video frame;
  • the angular velocity information of the moving target when the second video frame is acquired is acquired by the gyro sensor.
  • the structure of the second extraction module 603 is as shown in FIG. 7, and includes:
  • the first calculating submodule 6031 is configured to calculate position change information of the moving target according to the acceleration information and the angular velocity information;
  • the second calculation sub-module 6032 is configured to determine a first position of the motion target in the second video frame according to the position information of the motion target in the first video frame and the position change information.
  • the structure of the first calculating module 606 is as shown in FIG. 8 and includes:
  • the fusion sub-module 6061 is configured to fuse the matching feature with the second feature to obtain a common feature of the matching feature and the second feature;
  • the third calculation sub-module 6062 is configured to employ an optical flow algorithm to determine a second position of the moving object in the second video frame based on the common feature and the first feature.
  • the structure diagram of the third calculation submodule 6062 is as shown in FIG. 9, and includes:
  • a first calculating unit 60621, configured to calculate a scaling ratio and a rotation ratio of the common feature with respect to the first feature
  • the second calculating unit 60622 is configured to determine, by the preset tracking algorithm, the second position of the moving target in the second video frame according to the common feature, the scaling ratio, and the rotation ratio.
  • the structural diagram of the moving object visual tracking device provided by the embodiment of the present application is as shown in FIG. 10, and further includes:
  • the voting module 606 is configured to vote on the common feature, the scaling ratio, and the rotation ratio to generate a voting space;
  • the clustering module 607 is configured to cluster the voting space
  • the statistics module 608 is configured to count the length of the voting space after clustering
  • the apparatus provided in this embodiment of the present application further includes:
  • the filtering module is configured to perform Kalman filtering on the position feature information when the length of the polled space after the clustering of the statistics module 608 is greater than a preset threshold, to obtain a third position of the moving target in the second video frame.
  • the above-mentioned moving object visual tracking device may be located in an electronic device such as a portable notebook, a desktop computer, or a smart phone, but is not limited thereto.
  • An embodiment of the present application provides an electronic device, including a processor and a machine readable storage medium.
  • the machine readable storage medium stores machine executable instructions executable by a processor.
  • the processor executes the machine executable instructions, the implementation is as follows. Method steps:
  • a second position of the moving object in the second video frame is determined according to the matching feature by an optical flow algorithm.
  • the embodiment of the present application further provides an electronic device, as shown in FIG. 11, including a processor 1101, a communication interface 1102, a memory 1103, and a communication bus 1104.
  • the processor 1101, the communication interface 1102, and the memory 1103 pass through the communication bus 1104. Complete communication with each other,
  • the processor 110 when configured to execute the program stored on the memory 1103, implements the following steps:
  • a second position of the moving object in the second video frame is determined according to the matching feature by an optical flow algorithm.
  • the first position of the moving target in the second video frame according to the acceleration information and the angular velocity information of the moving target, and the position information of the moving target in the first video frame, that is, By introducing the acceleration information and the angular velocity information, the first position can be dynamically determined, that is, the search range of the moving target is dynamically changed, which reduces the amount of calculation compared to the search range in which the entire frame image is used as the moving target;
  • the sliding window method is not needed to match, the calculation amount is reduced, and the calculation efficiency is improved, and the application improves the real-time performance of the tracking algorithm and improves the tracking efficiency.
  • the embodiment of the present application provides a computer readable storage medium.
  • the computer readable storage medium stores a computer program.
  • the computer program is executed by the processor, the following method steps are implemented:
  • a second position of the moving object in the second video frame is determined according to the matching feature by an optical flow algorithm.
  • the first position of the moving target in the second video frame according to the acceleration information and the angular velocity information of the moving target, and the position information of the moving target in the first video frame, That is, by introducing the acceleration information and the angular velocity information, the first position can be dynamically determined, that is, the search range of the moving target is dynamically changed, which reduces the amount of calculation compared to the search range in which the entire frame image is used as the moving target;
  • the sliding window method is not needed to match, the calculation amount is reduced, and the calculation efficiency is improved, and the present application improves the real-time performance of the tracking algorithm and improves the tracking efficiency.
  • the embodiment of the present application further provides a computer program product comprising instructions, which when executed on a computer, cause the computer to perform a moving object visual tracking method as described above.
  • the first position of the moving target in the second video frame is dynamically determined, that is, the search range of the moving target is dynamically changed, which reduces the calculation amount compared to the search range in which the entire frame image is used as the moving target.
  • the sliding window method is not needed to match, the calculation amount is reduced, and the calculation efficiency is improved, and the application improves the real-time performance of the tracking algorithm and improves the tracking efficiency.
  • the embodiment of the present application also provides a computer program that, when run on a computer, causes the computer to perform a moving object visual tracking method as described above.
  • the first position of the moving target in the second video frame according to the acceleration information and the angular velocity information of the moving target, and the position information of the moving target in the first video frame, that is, By introducing the acceleration information and the angular velocity information, the first position can be dynamically determined, that is, the search range of the moving target is dynamically changed, which reduces the amount of calculation compared to the search range in which the entire frame image is used as the moving target;
  • the sliding window method is not needed to match, the calculation amount is reduced, and the calculation efficiency is improved, and the application improves the real-time performance of the tracking algorithm and improves the tracking efficiency.
  • the above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; or may be a digital signal processing (DSP), dedicated integration.
  • CPU central processing unit
  • NP network processor
  • DSP digital signal processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the apparatus, the electronic device, the storage medium, the computer program product including the instruction, and the computer program provided by the embodiments of the present application are respectively the device, the electronic device, the storage medium, and the computer including the instruction for applying the moving target visual tracking method.
  • the program product, the computer program, and all the embodiments of the above-described moving object visual tracking method are applicable to the device, the electronic device, the storage medium, the computer program product including the instruction, the computer program, and all achieve the same or similar beneficial effects.
  • the first position of the moving target in the second video frame is calculated, that is, the acceleration information is introduced.
  • the angular velocity information the first position can be dynamically determined, that is, the search range of the moving target is dynamically changed, which reduces the amount of calculation compared to the search range in which the entire frame image is used as the moving target; in addition, in this embodiment The sliding window method is not needed to match, the calculation amount is reduced, and the calculation efficiency is improved, and the application improves the real-time performance of the tracking algorithm and improves the tracking efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种运动目标视觉跟踪方法、装置、电子设备及存储介质,包括:确定第一视频帧中待跟踪的运动目标,确定运动目标在第一视频帧中的位置信息,并提取运动目标在第一视频帧中的第一特征;获取运动目标在第二视频帧中的加速度信息和角速度信息;计算运动目标在第二视频帧中的位置,并提取第二视频帧中位置处运动目标的第二特征;将第一特征和第二特征进行匹配,得到匹配特征;将匹配特征通过光流算法得到第一特征在第二视频帧中的位置特征信息。本申请实施例可以提高跟踪算法的实时性,提高跟踪效率。

Description

运动目标的视觉跟踪方法、装置、电子设备及存储介质
本申请要求于2017年9月25日提交中国专利局、申请号为201710872887.7、发明名称为“运动目标的视觉跟踪方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及一种运动目标的视觉跟踪方法、装置、电子设备及存储介质。
背景技术
在图像处理技术领域中,视觉跟踪是指对视频序列中的运动目标进行检测、提取特征、识别、定位和滤波,并获得运动目标的运动参数,如位置、速度和运动轨迹等。视觉跟踪技术是计算机视觉领域的热门研究方向之一,在视频监控、机器人定位、环境感知等领域有着广泛的应用,并且通过视觉跟踪技术可以对跟踪目标进一步的进行行为理解、分析和决策等高级任务提供必要的技术手段。
视觉跟踪技术得到了广泛的关注与研究,发展比较迅速,出现了一些成熟的算法,比如,基于局部信息的跟踪算法,该算法中,将目标的初始区域作为目标模板,将目标模板与图像中的所有区域进行模板匹配,将匹配度最高的地方作为目标的位置。常用的方法有Lucas-Kanade光流跟踪算法等。这种算法采用目标的全局信息,具有较高的可信度,但相关技术在目标存在变形或者遮挡的时候,比较容易跟踪失败,且一般需要滑窗匹配,计算量较大,导致跟踪效率低,实时性较差。
发明内容
本申请实施例的目的在于提供一种运动目标的视觉跟踪方法、装置、电子设备及存储介质,以提高跟踪算法的实时性。具体技术方案如下:内容
本申请实施例提供了一种运动目标视觉跟踪方法,所述方法包括:获取待跟踪的运动目标在第一视频帧中的位置信息,并根据所述位置信息,提取所述运动目标在所述第一视频帧中的第一特征;获取采集第二视频帧时所述运动目标的加速度信息和角速度信息,其中,所述第二视频帧为所述第一视频帧的下一视频帧;根据所述加速度信息和所述角速度信息,以及所述运动 目标在所述第一视频帧中的位置信息,计算所述运动目标在所述第二视频帧中的第一位置,并提取所述第二视频帧中所述第一位置处所述运动目标的第二特征;将所述第一特征和所述第二特征进行匹配,得到匹配特征;通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置。
本申请实施例提供了一种运动目标视觉跟踪装置,所述装置包括:第一提取模块,被设置为获取待跟踪的运动目标在第一视频帧中的位置信息,并根据所述位置信息,提取所述运动目标在所述第一视频帧中的第一特征;获取模块,被设置为获取采集第二视频帧时所述运动目标的加速度信息和角速度信息;其中,所述第二视频帧为所述第一视频帧的下一视频帧;第二提取模块,被设置为根据所述加速度信息和所述角速度信息,以及所述运动目标在所述第一视频帧中的位置信息,计算所述运动目标在所述第二视频帧中的第一位置,并提取所述第二视频帧中所述第一位置处所述运动目标的第二特征;匹配模块,被设置为将所述第一特征和所述第二特征进行匹配,得到匹配特征;第一计算模块,被设置为通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置。
本申请实施例提供了一种电子设备,包括处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令时,实现如上所述的方法步骤。
本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的方法步骤。
本申请实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述的一种运动目标视觉跟踪方法。
本申请实施例还提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述的一种运动目标视觉跟踪方法。
应用本申请实施例提供的运动目标的视觉跟踪方法、装置、电子设备及存储介质,根据运动目标的加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的第一位置,也就是通 过引入加速度信息和角速度信息,可以动态地确定该第一位置,也就是在动态改变运动目标的搜索范围,这相比于将整帧图像作为运动目标的搜索范围,减少了计算量;另外,本实施方式中,不需要采用滑窗方式来匹配,减少了计算量,提高了计算效率,进而本申请提高了跟踪算法的实时性,提高了跟踪效率。当然,实施本申请的任一产品或方法必不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一种实施例提供的运动目标视觉跟踪方法的流程图;
图2为本申请另一种实施例提供的运动目标视觉跟踪方法的流程图;
图3为本申请又一种实施例提供的运动目标视觉跟踪方法的流程图;
图4为本申请又一种实施例提供的运动目标视觉跟踪方法的流程图;
图5a为本申请再一种实施例提供的运动目标视觉跟踪方法的流程图;
图5b为本申请再一种实施例提供的运动目标视觉跟踪方法的流程图;
图6为本申请一种实施例提供的运动目标视觉跟踪装置的结构示意图;
图7为本申请另一种实施例提供的运动目标视觉跟踪装置的结构示意图;
图8为本申请又一种实施例提供的运动目标视觉跟踪装置的结构示意图;
图9为本申请又一种实施例提供的运动目标视觉跟踪装置的结构示意图;
图10为本申请再一种实施例提供的运动目标视觉跟踪装置的结构示意图;
图11为本申请实施例提供的电子设备结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
相关技术中的基于局部信息的跟踪算法在目标存在变形或者遮挡的时候,比较容易跟踪失败,且一般需要滑窗匹配,计算量较大,导致跟踪效率低,实时性较差。为了提高视觉跟踪算法的实时性,提高计算效率,本申请实施例提供了一种运动目标视觉跟踪方法、装置、电子设备及存储介质,以下分别进行详细说明。
图1为本申请一种实施例提供的运动目标视觉跟踪方法的流程图,包括如下步骤:
步骤101,确定第一视频帧中待跟踪的运动目标,确定运动目标在第一视频帧中的位置信息,并提取运动目标在第一视频帧中的第一特征。
本申请实施例提供的视觉跟踪方法可以应用于便携式笔记本、台式计算机、智能手机等电子设备中。电子设备的处理器接收的输入可以为多个视频帧,多个视频帧可以是时间上相邻的、对同一运动目标进行拍摄得到的一组视频帧;上述多个视频帧也可以是利用智能手机等进行实时拍摄得到的;上述多个视频帧也可以从电子设备的图库中获取得到。其中,上述第一视频帧可以是电子设备接收的多个视频帧中的任一视频帧。
步骤101可以表现为:获取待跟踪的运动目标在第一视频帧中的位置信息,根据所述位置信息,提取所述运动目标在所述第一视频帧中的第一特征。
举例来说,获取运动目标在第一视频帧中的位置信息的方式可以有多种:比如,如果第一视频帧为一段视频中的第一帧,则可以由用户输入待跟踪的运动目标的位置信息;如果第一视频帧不为视频中的第一帧,则可以利用本方案对运动目标进行跟踪,根据第一视频帧的上一帧对应的跟踪结果,获取运动目标在第一视频帧中的位置信息,但并不限于此。
运动目标的位置信息可以包括:运动目标在第一视频帧中的空间坐标和运动目标在第一视频帧中与水平面的夹角等。需要说明的是,获取到该位置信息,也就在第一视频帧中找到了待跟踪的运动目标,提取该位置信息处的目标的特征,也就是提取待跟踪的运动目标的特征。
当接收到第一视频帧时,处理器可以根据获取到的位置信息,确定运动目标在第一视频帧中所在的矩形区域,该矩形区域具体可以为边长L的矩形边 框,待跟踪的运动目标位于该边框内。
在第一视频帧中,待跟踪的运动目标称之为前景,前景也可以理解为上述运动目标所在的矩形边框内的内容,但并不限于此,其余框外的部分称之为背景。检测第一视频帧中的所有特征点和特征算子,根据检测到的特征点和特征算子,对第一视频帧中的前景和背景进行分离,提取前景内的物体的特征,即运动目标的特征,可以直接通过OpenCV读取第一视频帧,实现提取运动目标的特征,并作为第一特征。
提取运动目标的特征可以包括提取运动目标的颜色特征、纹理特征和形状特征等,对于运动目标来说,上述特征还可以包括运动特征。其中,颜色特征是一种全局特征,是一种基于像素点的特征,由于颜色对图像或图像区域的方向、大小等变化不敏感,所以颜色特征不能很好地表示图像中运动目标的特征。颜色直方图是最常用的描述颜色特征的方法,其优点是不受图像旋转和平移变化的影响,进一步借助归一化还可不受图像尺度变化的影响,缺点是没有表达出颜色空间分布的信息。也可以通过颜色集、颜色矩、颜色相关图进行颜色特征的描述。
纹理特征也是一种全局特征,它也描述了图像或图像区域所对应景物的表面性质。但由于纹理只是一种运动目标表面的特性,并不能完全反映出运动目标的本质属性,所以仅仅利用纹理特征是无法获得高层次的图像内容。与颜色特征不同,纹理特征不是基于像素点的特征,它需要在包含多个像素点的区域中进行统计计算。在模式匹配中,这种区域性的特征具有较大的优越性,不会由于局部的偏差而无法匹配成功。作为一种统计特征,纹理特征常具有旋转不变性,并且对于噪声有较强的抵抗能力。但是,纹理特征也有缺点,当图像的分辨率变化的时候,所计算出来的纹理可能会有较大偏差。可以通过统计方法、几何法、模型法、信号处理法进行纹理特征的描述。
形状特征的特点是仅描述了运动目标局部的性质,要全面描述运动目标需要对计算时间和存储量有较高的要求;许多形状特征所反映的运动目标形状信息与人的直观感觉不完全一致,或者说,特征空间的相似性与人视觉系统感受到的相似性有差别。可以通过边界特征法、傅里叶形状描述符法、几何参数法等进行形状特征的描述。
步骤102,获取运动目标在第二视频帧中的加速度信息和角速度信息。
步骤102可以表现为:获取采集第二视频帧时运动目标的加速度信息和角速度信息。
处理器提取了运动目标在第一视频帧中的第一特征以后,可以通过实时拍摄或者直接从本地图库中获取第二视频帧,其中,第二视频帧为第一视频帧的下一视频帧,第一视频帧和第二视频帧为时间上相邻的视频帧。获取采集第二视频帧时、待跟踪的运动目标的加速度信息和角速度信息,其中,加速度信息和角速度信息可以表示采集第二视频帧时运动目标的位置姿态变化。
一种获取采集第二视频帧时运动目标的加速度信息和角速度信息的方法为:通过加速度传感器获取采集第二视频帧时运动目标的加速度信息;通过陀螺仪传感器获取采集第二视频帧时运动目标的角速度信息。
加速度传感器和陀螺仪传感器可以预先安装在电子设备中,以智能手机为例,可以在手机中安装微电子机械系统(Micro Electro Mechanical systems,简称MEMS)陀螺仪,通过测量旋转产生的科氏加速度来获得角速度,安装加速度计,通过测量加速度获得加速度信息。当使用智能手机拍摄运动目标时,在运动目标的运动方向上进行连续拍摄,使智能手机和运动目标保持相对静止,由加速度传感器和陀螺仪传感器获取得到的加速度信息和角速度信息为智能手机的运动信息,进而可以确定运动目标的运动信息,也可以在电子设备中预先安装重力感应计、方向传感器、姿态传感器等通过获取电子设备的运动信息,获取采集第二视频帧时运动目标的加速度信息和角速度信息。
步骤103,根据加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的位置,并提取第二视频帧中的位置处运动目标的第二特征。
得到采集第二视频帧时运动目标的加速度信息和角速度信息后,可以根据加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的位置。采集第二视频帧时运动目标的加速度信息和角速度信息反映了采集第二视频帧时运动目标的位置姿态变化,根据位置姿态变化,可以得到运动目标在第二视频帧中的位置。为了区分描述,将 运动目标在第二视频帧中的位置称为第一位置,第一位置是对第二视频帧中的运动目标粗略估计的一个候选位置。
作为本申请实施例中得到运动目标在第二视频帧中的位置的具体方法可以为:假设运动目标在第二视频帧中的位置的中心(第一位置的中心)为第一视频帧的中心,则边框的边长为L t=βsΔa tL,其中L为第一视频帧中运动目标边框的边长,L t为第二视频帧中运动目标边框的边长,β为常用参数,在本申请实施例中设置为2,s为第一视频帧待跟踪的运动目标的目标缩放系数,Δa t为运动目标的加速度信息和角速度信息的加权和。
由边长公式可知,当得到的加速度信息和角速度信息的加权和越大时,边框的边长越大,那么运动目标的边框就越大;同理,当得到的加速度信息和角速度信息的加权和越小时,边框的边长越小,那么运动目标的边框就越小;也就是说,一个视频帧中的运动目标的边框随着采集该视频帧时运动目标的加速度信息和角速度信息的加权和变化而变化。可见,本方案中,运动目标的边框大小是动态调整的,而且对边框中的图像区域进行特征提取,也就是说,特征提取针对的区域大小是动态调整的,这样,相比于对固定区域进行特征提取,效果更好。可以理解,如果设定的固定区域面积较大,则特征提取的计算量较大,如果设定的固定区域面积较小,很可能导致遗漏了特征,而本方案根据实际情况对区域大小进行动态调整,解决了这两种问题。
换句话说,运动目标的边框也可以理解为运动目标的搜索范围,搜索范围也就是在图像中搜索运动目标的范围,本方案中,可以对运动目标的搜索范围进行动态调整本申请。在本申请实施例提供的方案中,通过引入加速度信息和角速度信息,可以动态地改变运动目标的搜索范围(运动边框的大小),可以提高计算效率,在电子设备移动较大时,可以适当增加运动目标的搜索范围,避免搜索范围过小,造成跟踪失败。
得到运动目标在第二视频帧中的第一位置以后,处理器可以通过OpenCV直接读取第二视频帧中待跟踪的运动目标,提取该运动目标在第一位置处的特征,也就是第二视频帧中的运动目标的第二特征。提取第二特征的方法同 步骤101,包括提取运动目标的颜色特征、纹理特征、形状特征等,对于运动目标来说,还可以包括运动特征的提取。需要说明的是,本步骤中,提取的第二特征和提取的第一特征是一致的,如果在步骤101中,提取运动目标的颜色特征时,在采用颜色直方图描述颜色特征时,那么提取的第二特征同样为颜色特征;采用纹理特征和形状特征描述运动物体的特征时,都需要和第一特征保持一致。将第二特征与第一特征保持一致的原因是为了在同一标准下提取的特征,方便实现后续的匹配。
步骤104,将第一特征和第二特征进行匹配,得到匹配特征。
将步骤101和步骤103中分别得到的第一特征和第二特征进行匹配,该匹配方法可以根据提取的不同特征而定。例如,提取运动目标的颜色特征时,在用颜色直方图进行颜色特征描述时,可以通过直方图相交法、距离法、中心距法、参考颜色表法、累加颜色直方图法等方法进行颜色特征的匹配;提取运动目标的纹理特征时,可以通过灰度共生矩阵、小波变换等进行纹理特征的匹配。提取运动目标的形状特征时,可以通过基于小波和相对矩的形状特征匹配。将第一特征和第二特征进行匹配以后,得到匹配特征,匹配特征代表第一特征和第二特征中相似的特征。
举例来说,假设第一特征中的特征A与第二特征中的特征A’相似,则可以将特征A作为匹配特征,也可以将A’作为匹配特征,也可以将特征A和特征A’都作为匹配特征,也可以将特征A与特征A’进行融合,得到匹配特征,但并不限于此。
步骤105,将匹配特征通过光流算法得到第一特征在第二视频帧中的位置特征信息。
换句话说,也就是通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置。
得到第一特征和第二特征的匹配特征后,计算第一特征对应到第二视频帧的第二位置,或者说,该第二位置处的特征与第一特征相匹配,该第二位置也就是运动目标在第二视频帧的位置。在本申请实施例中,可以将匹配特征作为光流算法的输入,得到运动目标在第二视频帧的位置,光流算法通常 应用于跟踪在视频的连续帧中的特征。是利用视频帧序列中像素在时间域上的变化以及相邻帧之间的相关性来找到上一视频帧跟当前视频帧之间存在的对应关系,从而计算出相邻视频帧之间运动目标的运动信息的一种方法。将第一特征和第二特征中相似的特征通过光流算法,可以计算得到第一视频中的第一特征对应到第二视频帧中的位置特征信息,也就是运动目标在所述第二视频帧中的第二位置。为了区分描述,将本步骤中得到的位置称为第二位置,第二位置相比于第一位置更加精确。
应用本申请实施例提供的运动目标视觉跟踪方法,获取待跟踪的运动目标在第一视频帧中的位置信息,并根据该位置信息,提取该运动目标在该第一视频帧中的第一特征;获取采集第二视频帧时该运动目标的加速度信息和角速度信息,其中,该第二视频帧为该第一视频帧的下一视频帧;根据该加速度信息和该角速度信息,以及该运动目标在该第一视频帧中的位置信息,计算该运动目标在该第二视频帧中的第一位置,并提取该第二视频帧中该第一位置处该运动目标的第二特征;将该第一特征和该第二特征进行匹配,得到匹配特征;通过光流算法,根据该匹配特征确定该运动目标在该第二视频帧中的第二位置。
可见,本方案中,根据运动目标的加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的第一位置,也就是通过引入加速度信息和角速度信息,可以动态地确定该第一位置,也就是在动态改变运动目标的搜索范围,这相比于将整帧图像作为运动目标的搜索范围,减少了计算量;另外,本实施方式中,不需要采用滑窗方式来匹配,减少了计算量,提高了计算效率,进而本申请提高了跟踪算法的实时性,提高了跟踪效率。
作为本申请另一种具体实施方式,结合上述实施例,在步骤103中的根据加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的位置的方法流程图,如图2所示,包括如下步骤:
步骤1031,根据加速度信息和角速度信息,计算运动目标的位置变化信息。
处理器获取到采集第二视频帧时运动目标的加速度信息和角速度信息以 后,可以计算得到运动目标在第一视频帧与第二视频帧之间的位置变化信息。例如,可以根据加速度信息得到运动目标的运动加速度,角速度信息得到运动目标的运动方向和旋转角度,根据运动目标的运动加速度、运动方向和旋转角度,可以得到运动目标的运动方向和运动距离,也即相对于第一视频帧中的位置变化信息。
步骤1032,根据运动目标在第一视频帧中的位置信息,以及位置变化信息,确定运动目标在第二视频帧中的位置。
为了区分描述,将本步骤中得到的位置称为第一位置(可相当于上述实施例中的第一位置),第一位置是对第二视频帧中的运动目标粗略估计的一个候选位置。
根据步骤101中运动目标在第一视频帧中的位置信息,和运动目标在第二视频帧中相对于第一视频帧的位置变化信息,可以得知运动目标第一视频帧以后运动的变化,结合第一视频帧中的位置信息,可以得到运动目标在第二视频帧中的第一位置。
本申请实施例中通过优先根据加速度和角速度信息确定计算运动目标的位置变化信息,得到运动目标的位置变化,再结合第一视频帧中的位置信息,就可以得到运动目标在第二视频帧中的第一位置;通过引入加速度信息和角速度信息,可以动态地确定该第一位置,也就是在动态改变运动目标的搜索范围,这相比于将整帧图像作为运动目标的搜索范围,提高了计算效率,得到的第一位置具有实时性。
作为本申请又一种具体实施方式,在步骤105中的将匹配特征通过光流算法得到第一特征在第二视频帧中的位置特征信息的方法流程图如图3所示,包括如下步骤:
步骤1051,将匹配特征与第二特征进行融合,得到匹配特征与第二特征的共同特征。
将第一特征和第二特征进行匹配以后,将得到的匹配特征与提取的第二特征通过融合的方法生成新的特征,得到匹配特征和第二特征的共同特征。本实施例中的共同特征可以相当于上述实施例中的融合特征。
步骤1052,将共同特征与第一特征采用光流算法,得到第一特征在第二视频帧中的位置特征信息。
换句话说,也就是采用光流算法,根据所述共同特征与所述第一特征,确定所述运动目标在所述第二视频帧中的第二位置。“位置特征信息”也就是第二位置。
光流算法是利用视频帧序列中像素在时间域上的变化以及相邻帧之间的相关性来找到上一视频帧跟当前视频帧之间存在的对应关系,从而计算出相邻视频帧之间运动目标的运动信息的一种方法。将共同特征与第一特征作为光流算法的输入,可以计算得到运动目标在所述第二视频帧中的第二位置。为了区分描述,将本步骤中得到的位置称为第二位置,第二位置相比于第一位置更加精确。
作为本申请又一种具体实施方式,结合上述实施例,在步骤1052中的将共同特征与第一特征采用光流算法,得到第一特征在第二视频帧中的位置特征信息的方法流程图如图4所示,包括如下步骤:
步骤1052a,计算共同特征相对于第一特征的缩放比例和旋转比例。
将第一特征和第二特征进行匹配以后得到匹配特征,将该匹配特征与第二特征通过融合的方法生成新的特征,新的特征也就是匹配特征和第二特征的共同特征,在这之后,分别计算共同特征相对于第一特征的缩放比例和旋转比例。计算共同特征与第一特征两两之间的相对距离和相对角度,将共同特征与第一特征进行对比,计算共同特征相对于第一特征的缩放比例。
由运动目标在第二视频帧中的边框的边长公式可知,当得到的加速度信息和角速度信息的加权和越大时,边长越大,那么运动目标的边框就越大,本申请实施例提供的方法对运动目标搜索范围就越大,共同特征相对于第一特征的放大比例越大。同理,当得到的加速度信息和角速度信息的加权和越小时,边长越小,那么运动目标的边框就越小,本申请实施例提供的方法对运动目标搜索范围就越小,共同特征相对于第一特征的缩小比例越小;旋转比例可以由同样的计算方法得到。
如上所述,运动目标的边框也可以理解为运动目标的搜索范围,搜索范 围也就是在图像中搜索运动目标的范围;另外,运动目标的边框还可以理解为特征提取针对的区域。
步骤1052b,将共同特征、缩放比例和旋转比例通过预设跟踪算法,得到第一特征在第二视频帧中的位置特征信息。
换句话说,也就是通过预设跟踪算法,根据所述共同特征、所述缩放比例和旋转比例,确定所述运动目标在所述第二视频帧中的第二位置。为了区分描述,将本步骤中得到的位置称为第二位置,上述位置特征信息也就是第二位置,第二位置相比于第一位置更加精确。
举例来说,预设跟踪算法可以为CMT算法、局部特征提取算法等等,具体不做限定。通过该预设跟踪算法对步骤1052a中得到的缩放比例、旋转比例和共同特征进行处理,得到缩放比例、旋转比例和共同特征中得到的每个数据特征点,根据该数据特征点,得到第一特征在第二视频帧中的位置特征信息。
作为本申请再一种具体实施方式,在步骤105中的将匹配特征通过光流算法得到第一特征在第二视频帧中的位置特征信息之后的方法如图5a所示,还包括如下步骤:
步骤106,将共同特征、缩放比例和旋转比例进行投票,生成投票空间。
步骤106可以表现为:根据共同特征、缩放比例和旋转比例,计算每一个特征点的投票值,选择投票值满足条件的特征点,所选择的特征点组成投票空间。
在本申请实施例中,满足条件可以为投票值大于预设阈值。
通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置之后,将共同特征、缩放比例和旋转比例进行投票,进行投票操作的原理是:在把缩放比例和旋转比例考虑进去之后,这些特征点相对中心的相对距离应该是相对不变的,也就是下一帧的特征点相对中心的位置是不变的。但是由于图像本身的变化,不可能得到完全一样的相对位置,这个时候,有一些特征点会离中心近,有一些特征点会偏差很大。那么,采用聚类的方法,可以选择最大的类作为最好的特征点。
根据共同特征、缩放比例和旋转比例中的数据,计算每一个特征点的投票值,选择投票值满足条件的特征点,所选择的特征点有多个,且所选择的每一个特征点具有高特征性强度和精准的定位,由这多个特征点组成一个特征向量,该特征向量即为投票空间。
步骤107,将投票空间进行聚类。
将生成的投票空间进行聚类,其中,聚类是一种数据分析方法,可以将投票空间中依赖关系较大的特征点聚集到一起,聚类后的投票空间里由依赖关系较大的特征点组成,聚类后的投票空间为步骤106中的投票空间中的一部分,聚类后的投票空间是上述多个特征点组成的特征向量的子向量。
步骤108,统计聚类后的投票空间的长度。
聚类后的投票空间是一个由依赖关系较大的特征点组成的特征子向量,计算特征子向量的长度,得到的长度值即为聚类后的投票空间的长度。
在一些示例中,本申请实施例提供的方法还可以包括,如图5b所示:
步骤109,当聚类后的投票空间的长度大于预设阈值时,对位置特征信息进行卡尔曼滤波,得到运动目标在第二视频帧中的位置信息。为了区分描述,将本步骤中得到的位置信息称为第三位置,对第二位置(位置特征信息)进行滤波处理得到第三位置,第三位置相比于第二位置更加精确。
位置特征信息中特征点的存在噪声,会影响跟踪结果的稳定性,因此可以通过卡尔曼滤波去除噪声的影响。在本申请实施例中,可以根据公式R t=R/Δa t进行卡尔曼滤波,其中R为初始的噪声协方差,Δa t为传感器得到的当前视频帧相对于前一视频帧的变化信息,R t为当前视频帧的噪声协方差。当电子设备移动较快时,减少测量噪声的协方差R t,以减少卡尔曼滤波器的滞后程度。在电子设备移动较慢时,增大噪声的协方差R t,让滤波的结果更加平滑稳定。
当步骤108中统计的聚类后的投票空间的长度大于预设阈值时,也就是说, 依赖关系较大的特征点较多,得到的特征比较匹配,此时,计算最新的矩形框的参数,框中的运动目标的位置信息就是待跟踪运动目标的位置特征信息,也就是初步的跟踪结果,但是初步的跟踪结果中含有噪声,需要将位置特征信息进行卡尔曼滤波后才能得到稳定的跟踪结果;如果长度小于预设阈值时,也就是矩形框太小,无法框住运动目标,跟踪失败。
相关的视觉跟踪技术中还包括一种基于特征点的跟踪算法,基于特征点的跟踪算法只考虑目标的显著特征,因此能够在存在部分遮挡和变形的条件下,实现跟踪。比如,基于一致性匹配和跟踪特征点的目标跟踪(Clustering of Static-Adaptive Correspondences for Deformable Object Tracking,简称CMT)算法是一种基于特征点的跟踪算法,可以跟踪任何具有显著特征的物体。CMT算法将通过计算前后图像帧之间的前向和后向光流来得到特征和通过匹配特征算子得到的特征,采用聚类的方法进行筛选,以获得一致的鲁棒的特征。此外CMT算法以框的中心来计算特征点的相对位置,对于不形变的目标,其特征相对于中心的距离在缩放比例下是不变的,所以算法能够跟踪旋转的目标。
这种基于特征点的跟踪算法可以得到对应的匹配特征点,并通过最小二乘法估计出目标的位置和姿态等信息,且能适应一定的遮挡和变形,但CMT算法虽然跟踪性能好,算法效率高,但对要求计算效率和耗电量的移动端设备而言,不能完全的满足跟踪要求,而且CMT算法对特征点准确性要求较高,在实际应用中提取的特征点通常存在小范围的误差,方案的稳定性较差,难以满足需要比较稳定的目标跟踪应用,如增强技术(Augmented Reality,简称AR)贴图等。
而本实施方式中,对第二位置进行卡尔曼滤波得到运动目标在第二视频帧中的第三位置,相比于这种基于特征点的跟踪算法,提高了稳定性。
本申请实施例还提供了一种运动目标视觉跟踪装置,装置的结构图如图6所示,包括:
第一提取模块601,被设置为获取待跟踪的运动目标在第一视频帧中的位置信息,并根据所述位置信息,提取运动目标在第一视频帧中的第一特征;
获取模块602,被设置为获取采集第二视频帧时运动目标的加速度信息和角速度信息;
第二提取模块603,被设置为根据加速度信息和角速度信息,计算运动目标在第二视频帧中的第一位置,并提取第二视频帧中第一位置处运动目标的第二特征;
匹配模块604,被设置为将第一特征和第二特征进行匹配,得到匹配特征;
第一计算模块605,被设置为通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置。
应用本申请实施例提供的运动目标视觉跟踪装置,获取待跟踪的运动目标在第一视频帧中的位置信息,并根据该位置信息,提取该运动目标在该第一视频帧中的第一特征;获取采集第二视频帧时该运动目标的加速度信息和角速度信息,其中,该第二视频帧为该第一视频帧的下一视频帧;根据该加速度信息和该角速度信息,以及该运动目标在该第一视频帧中的位置信息,计算该运动目标在该第二视频帧中的第一位置,并提取该第二视频帧中该第一位置处该运动目标的第二特征;将该第一特征和该第二特征进行匹配,得到匹配特征;通过光流算法,根据该匹配特征确定该运动目标在该第二视频帧中的第二位置。
可见,本方案中,根据运动目标的加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的第一位置,也就是通过引入加速度信息和角速度信息,可以动态地确定该第一位置,也就是在动态改变运动目标的搜索范围,这相比于将整帧图像作为运动目标的搜索范围,减少了计算量;另外,本实施方式中,不需要采用滑窗方式来匹配,减少了计算量,提高了计算效率,进而本申请提高了跟踪算法的实时性,提高了跟踪效率。
在本申请实施例中,获取模块602,具体被设置为通过加速度传感器获取采集第二视频帧时运动目标的加速度信息;
通过陀螺仪传感器获取采集第二视频帧时运动目标的角速度信息。
在本申请实施例中,第二提取模块603的结构图如图7所示,包括:
第一计算子模块6031,被设置为根据加速度信息和角速度信息,计算运动目标的位置变化信息;
第二计算子模块6032,被设置为根据运动目标在第一视频帧中的位置信息,以及位置变化信息,确定运动目标在第二视频帧中的第一位置。
在本申请实施例中,第一计算模块606的结构图如图8所示,包括:
融合子模块6061,被设置为将匹配特征与第二特征进行融合,得到匹配特征与第二特征的共同特征;
第三计算子模块6062,被设置为采用光流算法,根据所述共同特征与所述第一特征,确定所述运动目标在所述第二视频帧中的第二位置。
在本申请实施例中,第三计算子模块6062的结构图如图9所示,包括:
第一计算单元60621,被设置为计算共同特征相对于第一特征的缩放比例和旋转比例;
第二计算单元60622,被设置为通过预设跟踪算法,根据所述共同特征、所述缩放比例和旋转比例,确定所述运动目标在所述第二视频帧中的第二位置。
在本申请实施例中,本申请实施例提供的运动目标视觉跟踪装置的结构图如图10所示,还包括:
投票模块606,被设置为将共同特征、缩放比例和旋转比例进行投票,生成投票空间;
聚类模块607,被设置为将投票空间进行聚类;
统计模块608,被设置为统计聚类后的投票空间的长度;
在本申请实施例中,本申请实施例提供的装置还包括:
滤波模块,被设置为当统计模块608统计的聚类后的投票空间的长度大于预设阈值时,对位置特征信息进行卡尔曼滤波,得到运动目标在第二视频帧中的第三位置。
需要说明的是,上述运动目标视觉跟踪装置可以位于便携式笔记本、台 式计算机、智能手机等电子设备中,但并不限于此。
本申请实施例提供了一种电子设备,包括处理器和机器可读存储介质,机器可读存储介质存储有能够被处理器执行的机器可执行指令,处理器执行机器可执行指令时,实现如下的方法步骤:
获取待跟踪的运动目标在第一视频帧中的位置信息,并根据所述位置信息,提取所述运动目标在所述第一视频帧中的第一特征;
获取采集第二视频帧时所述运动目标的加速度信息和角速度信息,其中,所述第二视频帧为所述第一视频帧的下一视频帧;
根据所述加速度信息和所述角速度信息,以及所述运动目标在所述第一视频帧中的位置信息,计算所述运动目标在所述第二视频帧中的第一位置,并提取所述第二视频帧中所述第一位置处所述运动目标的第二特征;
将所述第一特征和所述第二特征进行匹配,得到匹配特征;
通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置。
本申请实施例还提供了一种电子设备,如图11所示,包括处理器1101、通信接口1102、存储器1103和通信总线1104,其中,处理器1101,通信接口1102,存储器1103通过通信总线1104完成相互间的通信,
存储器1103,被设置为存放计算机程序;
处理器1101,被设置为执行存储器1103上所存放的程序时,实现如下步骤:
获取待跟踪的运动目标在第一视频帧中的位置信息,并根据所述位置信息,提取所述运动目标在所述第一视频帧中的第一特征;
获取采集第二视频帧时所述运动目标的加速度信息和角速度信息,其中,所述第二视频帧为所述第一视频帧的下一视频帧;
根据所述加速度信息和所述角速度信息,以及所述运动目标在所述第一视频帧中的位置信息,计算所述运动目标在所述第二视频帧中的第一位置, 并提取所述第二视频帧中所述第一位置处所述运动目标的第二特征;
将所述第一特征和所述第二特征进行匹配,得到匹配特征;
通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置。
应用本申请实施例提供的电子设备,根据运动目标的加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的第一位置,也就是通过引入加速度信息和角速度信息,可以动态地确定该第一位置,也就是在动态改变运动目标的搜索范围,这相比于将整帧图像作为运动目标的搜索范围,减少了计算量;另外,本实施方式中,不需要采用滑窗方式来匹配,减少了计算量,提高了计算效率,进而本申请提高了跟踪算法的实时性,提高了跟踪效率。
本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质内存储有计算机程序,计算机程序被处理器执行时实现如下的方法步骤:
获取待跟踪的运动目标在第一视频帧中的位置信息,并根据所述位置信息,提取所述运动目标在所述第一视频帧中的第一特征;
获取采集第二视频帧时所述运动目标的加速度信息和角速度信息,其中,所述第二视频帧为所述第一视频帧的下一视频帧;
根据所述加速度信息和所述角速度信息,以及所述运动目标在所述第一视频帧中的位置信息,计算所述运动目标在所述第二视频帧中的第一位置,并提取所述第二视频帧中所述第一位置处所述运动目标的第二特征;
将所述第一特征和所述第二特征进行匹配,得到匹配特征;
通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置。
应用本申请实施例提供的计算机可读存储介质,根据运动目标的加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的第一位置,也就是通过引入加速度信息和角速度信息,可以动态地确定该第一位置,也就是在动态改变运动目标的搜索范围,这相 比于将整帧图像作为运动目标的搜索范围,减少了计算量;另外,本实施方式中,不需要采用滑窗方式来匹配,减少了计算量,提高了计算效率,进而本申请提高了跟踪算法的实时性,提高了跟踪效率。
本申请实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述的一种运动目标视觉跟踪方法。
应用本申请实施例提供的包含指令的计算机程序产品,根据运动目标的加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的第一位置,也就是通过引入加速度信息和角速度信息,可以动态地确定该第一位置,也就是在动态改变运动目标的搜索范围,这相比于将整帧图像作为运动目标的搜索范围,减少了计算量;另外,本实施方式中,不需要采用滑窗方式来匹配,减少了计算量,提高了计算效率,进而本申请提高了跟踪算法的实时性,提高了跟踪效率。
本申请实施例还提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述的一种运动目标视觉跟踪方法。
应用本申请实施例提供的计算机程序,根据运动目标的加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的第一位置,也就是通过引入加速度信息和角速度信息,可以动态地确定该第一位置,也就是在动态改变运动目标的搜索范围,这相比于将整帧图像作为运动目标的搜索范围,减少了计算量;另外,本实施方式中,不需要采用滑窗方式来匹配,减少了计算量,提高了计算效率,进而本申请提高了跟踪算法的实时性,提高了跟踪效率。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
需要说明的是,本申请实施例提供的装置、电子设备、存储介质、包含 指令的计算机程序产品、计算机程序分别是应用上述运动目标视觉跟踪方法的装置、电子设备、存储介质、包含指令的计算机程序产品、计算机程序,则上述运动目标视觉跟踪方法的所有实施例均适用于该装置、电子设备、存储介质、包含指令的计算机程序产品、计算机程序,且均能达到相同或相似的有益效果。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例、电子设备实施例、计算机可读存储介质实施例、计算机程序实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。
工业实用性
基于本申请实施例提供的技术方案,根据加速度信息和角速度信息,以及运动目标在第一视频帧中的位置信息,计算运动目标在第二视频帧中的第一位置,也就是通过引入加速度信息和角速度信息,可以动态地确定该第一位置,也就是在动态改变运动目标的搜索范围,这相比于将整帧图像作为运动目标的搜索范围,减少了计算量;另外,本实施方式中,不需要采用滑窗方式来匹配,减少了计算量,提高了计算效率,进而本申请提高了跟踪算法的实时性,提高了跟踪效率。

Claims (17)

  1. 一种运动目标视觉跟踪方法,所述方法包括:
    获取待跟踪的运动目标在第一视频帧中的位置信息,并根据所述位置信息,提取所述运动目标在所述第一视频帧中的第一特征;
    获取采集第二视频帧时所述运动目标的加速度信息和角速度信息,其中,所述第二视频帧为所述第一视频帧的下一视频帧;
    根据所述加速度信息和所述角速度信息,以及所述运动目标在所述第一视频帧中的位置信息,计算所述运动目标在所述第二视频帧中的第一位置,并提取所述第二视频帧中所述第一位置处所述运动目标的第二特征;
    将所述第一特征和所述第二特征进行匹配,得到匹配特征;
    通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置。
  2. 根据权利要求1所述的方法,其中,所述获取采集第二视频帧时所述运动目标的加速度信息和角速度信息,包括:
    通过加速度传感器获取采集第二视频帧时所述运动目标的加速度信息;
    通过陀螺仪传感器获取采集第二视频帧时所述运动目标的角速度信息。
  3. 根据权利要求1所述的方法,其中,所述根据所述加速度信息和所述角速度信息,以及所述运动目标在所述第一视频帧中的位置信息,计算所述运动目标在所述第二视频帧中的第一位置,包括:
    根据所述加速度信息和所述角速度信息,计算所述运动目标的位置变化信息;
    根据所述运动目标在所述第一视频帧中的位置信息,以及所述位置变化信息,确定所述运动目标在所述第二视频帧中的第一位置。
  4. 根据权利要求1所述的方法,其中,所述通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置,包括:
    将所述匹配特征与所述第二特征进行融合,得到所述匹配特征与所述第 二特征的共同特征;
    采用光流算法,根据所述共同特征与所述第一特征,确定所述运动目标在所述第二视频帧中的第二位置。
  5. 根据权利要求4所述的方法,其中,所述采用光流算法,根据所述共同特征与所述第一特征,确定所述运动目标在所述第二视频帧中的第二位置,包括:
    计算所述共同特征相对于所述第一特征的缩放比例和旋转比例;
    通过预设跟踪算法,根据所述共同特征、所述缩放比例和旋转比例,确定所述运动目标在所述第二视频帧中的第二位置。
  6. 根据权利要求5所述的方法,其中,所述通过预设跟踪算法,根据所述共同特征、所述缩放比例和旋转比例,确定所述运动目标在所述第二视频帧中的第二位置之后,所述方法还包括:
    将所述共同特征、所述缩放比例和旋转比例进行投票,生成投票空间;
    将所述投票空间进行聚类;
    统计所述聚类后的投票空间的长度。
  7. 根据权利要求6所述的方法,其中,所述方法还包括:
    当所述聚类后的投票空间的长度大于预设阈值时,对所述位置特征信息进行卡尔曼滤波,得到所述运动目标在所述第二视频帧中的第三位置。
  8. 一种运动目标视觉跟踪装置,所述装置包括:
    第一提取模块,被设置为获取待跟踪的运动目标在第一视频帧中的位置信息,并根据所述位置信息,提取所述运动目标在所述第一视频帧中的第一特征;
    获取模块,被设置为获取采集第二视频帧时所述运动目标的加速度信息和角速度信息;其中,所述第二视频帧为所述第一视频帧的下一视频帧;
    第二提取模块,被设置为根据所述加速度信息和所述角速度信息,以及所述运动目标在所述第一视频帧中的位置信息,计算所述运动目标在所述第 二视频帧中的第一位置,并提取所述第二视频帧中所述第一位置处所述运动目标的第二特征;
    匹配模块,被设置为将所述第一特征和所述第二特征进行匹配,得到匹配特征;
    第一计算模块,被设置为通过光流算法,根据所述匹配特征确定所述运动目标在所述第二视频帧中的第二位置。
  9. 根据权利要求8所述的装置,其中,所述获取模块,具体被设置为通过加速度传感器获取采集第二视频帧时所述运动目标的加速度信息;
    通过陀螺仪传感器获取采集第二视频帧时所述运动目标的角速度信息。
  10. 根据权利要求8所述的装置,其中,所述第二提取模块包括:
    第一计算子模块,被设置为根据所述加速度信息和所述角速度信息,计算所述运动目标的位置变化信息;
    第二计算子模块,被设置为根据所述运动目标在所述第一视频帧中的位置信息,以及所述位置变化信息,确定所述运动目标在所述第二视频帧中的第一位置。
  11. 根据权利要求8所述的装置,其中,所述第一计算模块包括:
    融合子模块,被设置为将所述匹配特征与所述第二特征进行融合,得到所述匹配特征与所述第二特征的共同特征;
    第三计算子模块,被设置为采用光流算法,根据所述共同特征与所述第一特征,确定所述运动目标在所述第二视频帧中的第二位置。
  12. 根据权利要求11所述的装置,其中,所述第三计算子模块包括:
    第一计算单元,被设置为计算所述共同特征相对于所述第一特征的缩放比例和旋转比例;
    第二计算单元,被设置为通过预设跟踪算法,根据所述共同特征、所述缩放比例和旋转比例,确定所述运动目标在所述第二视频帧中的第二位置。
  13. 根据权利要求12所述的装置,其中,所述装置还包括:
    投票模块,被设置为将所述共同特征、所述缩放比例和旋转比例进行投票,生成投票空间;
    聚类模块,被设置为将所述投票空间进行聚类;
    统计模块,被设置为统计所述聚类后的投票空间的长度。
  14. 根据权利要求13所述的装置,其中,所述装置还包括:
    滤波模块,被设置为当所述统计模块统计的所述聚类后的投票空间的长度大于预设阈值时,对所述位置特征信息进行卡尔曼滤波,得到所述运动目标在所述第二视频帧中的第三位置。
  15. 一种电子设备,包括处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令时,实现权利要求1-7任一所述的方法步骤。
  16. 一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-7任一所述的方法步骤。
  17. 一种计算机程序,当所述计算机程序在计算机上运行时,使得计算机执行权利要求1-7任一项所述的方法步骤。
PCT/CN2018/107289 2017-09-25 2018-09-25 运动目标的视觉跟踪方法、装置、电子设备及存储介质 WO2019057197A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710872887.7 2017-09-25
CN201710872887.7A CN109559330B (zh) 2017-09-25 2017-09-25 运动目标的视觉跟踪方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019057197A1 true WO2019057197A1 (zh) 2019-03-28

Family

ID=65811021

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107289 WO2019057197A1 (zh) 2017-09-25 2018-09-25 运动目标的视觉跟踪方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN109559330B (zh)
WO (1) WO2019057197A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929093A (zh) * 2019-11-20 2020-03-27 百度在线网络技术(北京)有限公司 用于搜索控制的方法、装置、设备和介质
CN110930436A (zh) * 2019-11-27 2020-03-27 深圳市捷顺科技实业股份有限公司 一种目标跟踪方法及设备
CN112166458A (zh) * 2019-10-17 2021-01-01 深圳市大疆创新科技有限公司 目标检测与跟踪方法、系统、设备及存储介质
CN112233252A (zh) * 2020-10-23 2021-01-15 上海影谱科技有限公司 一种基于特征匹配与光流融合的ar目标跟踪方法及系统
CN112419368A (zh) * 2020-12-03 2021-02-26 腾讯科技(深圳)有限公司 运动目标的轨迹跟踪方法、装置、设备及存储介质
CN117241133A (zh) * 2023-11-13 2023-12-15 武汉益模科技股份有限公司 基于非固定位置的多工序同时作业的视觉报工方法及系统
US11900614B2 (en) * 2019-04-30 2024-02-13 Tencent Technology (Shenzhen) Company Limited Video data processing method and related apparatus

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147750B (zh) * 2019-05-13 2021-08-24 深圳先进技术研究院 一种基于运动加速度的图像搜索方法、系统及电子设备
CN110415276B (zh) * 2019-07-30 2022-04-05 北京字节跳动网络技术有限公司 运动信息计算方法、装置及电子设备
CN110781824B (zh) 2019-10-25 2023-03-14 阿波罗智联(北京)科技有限公司 目标检测及跟踪方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060008119A1 (en) * 2004-06-01 2006-01-12 Energid Technologies Visual object recognition and tracking
US20110142282A1 (en) * 2009-12-14 2011-06-16 Indian Institute Of Technology Bombay Visual object tracking with scale and orientation adaptation
CN104200494A (zh) * 2014-09-10 2014-12-10 北京航空航天大学 一种基于光流的实时视觉目标跟踪方法
CN105872477A (zh) * 2016-05-27 2016-08-17 北京旷视科技有限公司 视频监控方法和视频监控系统
CN106327517A (zh) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 一种目标跟踪装置及目标跟踪方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5832341B2 (ja) * 2012-03-09 2015-12-16 株式会社トプコン 動画処理装置、動画処理方法および動画処理用のプログラム
CN103325108A (zh) * 2013-05-27 2013-09-25 浙江大学 一种融合光流与特征点匹配的单目视觉里程计的设计方法
CN104951753B (zh) * 2015-06-05 2018-11-27 张巍 一种有标识物6自由度视觉跟踪系统及其实现方法
CN105931275A (zh) * 2016-05-23 2016-09-07 北京暴风魔镜科技有限公司 基于移动端单目和imu融合的稳定运动跟踪方法和装置
CN106842625B (zh) * 2017-03-03 2020-03-17 西南交通大学 一种基于特征共识性的目标追踪方法
CN106814753B (zh) * 2017-03-20 2020-11-06 成都通甲优博科技有限责任公司 一种目标位置矫正方法、装置及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060008119A1 (en) * 2004-06-01 2006-01-12 Energid Technologies Visual object recognition and tracking
US20110142282A1 (en) * 2009-12-14 2011-06-16 Indian Institute Of Technology Bombay Visual object tracking with scale and orientation adaptation
CN104200494A (zh) * 2014-09-10 2014-12-10 北京航空航天大学 一种基于光流的实时视觉目标跟踪方法
CN106327517A (zh) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 一种目标跟踪装置及目标跟踪方法
CN105872477A (zh) * 2016-05-27 2016-08-17 北京旷视科技有限公司 视频监控方法和视频监控系统

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11900614B2 (en) * 2019-04-30 2024-02-13 Tencent Technology (Shenzhen) Company Limited Video data processing method and related apparatus
CN112166458A (zh) * 2019-10-17 2021-01-01 深圳市大疆创新科技有限公司 目标检测与跟踪方法、系统、设备及存储介质
CN112166458B (zh) * 2019-10-17 2024-04-26 深圳市大疆创新科技有限公司 目标检测与跟踪方法、系统、设备及存储介质
CN110929093A (zh) * 2019-11-20 2020-03-27 百度在线网络技术(北京)有限公司 用于搜索控制的方法、装置、设备和介质
CN110929093B (zh) * 2019-11-20 2023-08-11 百度在线网络技术(北京)有限公司 用于搜索控制的方法、装置、设备和介质
CN110930436A (zh) * 2019-11-27 2020-03-27 深圳市捷顺科技实业股份有限公司 一种目标跟踪方法及设备
CN110930436B (zh) * 2019-11-27 2023-04-14 深圳市捷顺科技实业股份有限公司 一种目标跟踪方法及设备
CN112233252A (zh) * 2020-10-23 2021-01-15 上海影谱科技有限公司 一种基于特征匹配与光流融合的ar目标跟踪方法及系统
CN112233252B (zh) * 2020-10-23 2024-02-13 上海影谱科技有限公司 一种基于特征匹配与光流融合的ar目标跟踪方法及系统
CN112419368A (zh) * 2020-12-03 2021-02-26 腾讯科技(深圳)有限公司 运动目标的轨迹跟踪方法、装置、设备及存储介质
CN117241133A (zh) * 2023-11-13 2023-12-15 武汉益模科技股份有限公司 基于非固定位置的多工序同时作业的视觉报工方法及系统
CN117241133B (zh) * 2023-11-13 2024-02-06 武汉益模科技股份有限公司 基于非固定位置的多工序同时作业的视觉报工方法及系统

Also Published As

Publication number Publication date
CN109559330A (zh) 2019-04-02
CN109559330B (zh) 2021-09-10

Similar Documents

Publication Publication Date Title
WO2019057197A1 (zh) 运动目标的视觉跟踪方法、装置、电子设备及存储介质
US10198823B1 (en) Segmentation of object image data from background image data
CN103325112B (zh) 动态场景中运动目标快速检测方法
US10055013B2 (en) Dynamic object tracking for user interfaces
Santos et al. Orthogonal variant moments features in image analysis
Conte et al. Performance evaluation of a people tracking system on pets2009 database
Zhao et al. A compatible framework for RGB-D SLAM in dynamic scenes
Liu et al. A SLAM-based mobile augmented reality tracking registration algorithm
Park et al. Hand detection and tracking using depth and color information
Jung et al. Object Detection and Tracking‐Based Camera Calibration for Normalized Human Height Estimation
Karbasi et al. Real-time hands detection in depth image by using distance with Kinect camera
Wang et al. Hand posture recognition from disparity cost map
Sahili et al. A Survey of Visual SLAM Methods
CN107274477B (zh) 一种基于三维空间表层的背景建模方法
Jaramillo et al. Visual odometry with a single-camera stereo omnidirectional system
Poglitsch et al. [POSTER] A Particle Filter Approach to Outdoor Localization Using Image-Based Rendering
Gan et al. A dynamic detection method to improve SLAM performance
Spampinato et al. Advanced low cost clustering system
Meenatchi et al. Multiple object tracking and segmentation in video sequences
Kainz et al. Estimating the Height of a Person from a Video Sequence
Zhu et al. Automatic refinement strategies for manual initialization of object trackers
Bhuvaneswari et al. TRACKING MANUALLY SELECTED OBJECT IN VIDEOS USING COLOR HISTOGRAM MATCHING.
He et al. Recent advance on mean shift tracking: A survey
Jasphin et al. 3D Feature Descriptors and Image Reconstruction Techniques: A Review
Mohammed et al. An improved CAMShift algorithm for object detection and extraction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18859159

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18859159

Country of ref document: EP

Kind code of ref document: A1