CN112102404A

CN112102404A - Object detection tracking method and device and head-mounted display equipment

Info

Publication number: CN112102404A
Application number: CN202010817884.5A
Authority: CN
Inventors: 吴涛
Original assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Current assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-12-18
Anticipated expiration: 2040-08-14

Abstract

The application discloses an object detection tracking method, an object detection tracking device and a head-mounted display device, wherein the method comprises the following steps: acquiring a plurality of to-be-detected object images acquired under different acquisition conditions, extracting feature points of the images, and extracting feature descriptors of the extracted feature points to obtain a plurality of feature points and corresponding feature descriptors; determining three-dimensional point cloud coordinates of each feature point under a camera coordinate system; matching the feature descriptors obtained by the object to be detected under different acquisition conditions to obtain a plurality of matched feature point pairs; and determining a coordinate transformation matrix of the characteristic points of the object to be detected among different acquisition conditions according to the three-dimensional point cloud coordinates of each matched characteristic point pair, and aligning the three-dimensional point cloud coordinates of the characteristic points of the object to be detected under different acquisition conditions according to the coordinate transformation matrix. According to the method and the device, the characteristic information of the object under different acquisition conditions is matched and aligned, so that the probability of successful follow-up object tracking and re-matching is greatly improved.

Description

Object detection tracking method and device and head-mounted display equipment

Technical Field

The application relates to the technical field of human-computer interaction, in particular to an object detection tracking method and device and a head-mounted display device.

Background

As is known, in the fields of Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR), object detection often occurs, for example, in the interaction of a VR/MR multitasking office system, multiple Virtual task windows need to be opened in a VR perspective (See-through) mode to simulate multiple computer screen displays in a real environment and simultaneously display and process multiple tasks. The virtual task windows need to be overlapped and interacted with an office desktop of a real environment in a See-through perspective mode, a plurality of display devices are simulated to be placed on the office desktop of the real environment, and the process needs to utilize technologies such as a computer vision processing algorithm, an image processing method and image rendering to realize that the virtual task windows are overlapped on the office desktop of the real environment, so that one-to-one high-precision restoration is achieved.

For example, in scene interaction in the AR field, it is often necessary to detect some key objects in the real environment, for example, some common objects such as a table, a stool, and a sofa, and the AR glasses worn by the user are used to generate virtual-real interaction to some extent with the objects in the real environment by using technologies such as a computer vision processing algorithm, an image processing method, and graphic rendering.

However, in the VR, MR, or AR domains, the user often only performs one virtual and real interactive modeling with the real environment scene that the user wants to interact with, and then can interact with the real environment scene at any time. However, in the existing human-computer interaction device in the prior art, after the user and the required real environment scene are subjected to virtual-real modeling, the user can smoothly perform virtual interaction at that time, but most devices prompt the user that no modeling information exists in the area and prompt that model creation and the like are required to be performed when the user wants to perform virtual-real interaction again in the environment of the first modeling after a period of time, which greatly reduces the experience value of the user.

Disclosure of Invention

In view of the above, the present invention provides an object detecting and tracking method, an object detecting and tracking device, and a head-mounted display device, which are used to solve the technical problem that the existing object detecting and tracking method is not efficient.

The embodiment of the application adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides an object detection and tracking method, including:

acquiring a plurality of images of an object to be detected, which are acquired under different acquisition conditions, respectively extracting feature points of the images, and extracting feature descriptors of the extracted feature points to obtain a plurality of feature points of the object to be detected under different acquisition conditions and feature descriptors corresponding to the feature points;

determining three-dimensional point cloud coordinates of each characteristic point of the object to be detected under different acquisition conditions under a camera coordinate system;

matching the feature descriptors obtained by the object to be detected under different acquisition conditions to obtain a plurality of matched feature point pairs;

and determining a coordinate transformation matrix of the characteristic points of the object to be detected among different acquisition conditions according to the three-dimensional point cloud coordinates corresponding to the matched characteristic point pairs, and aligning the three-dimensional point cloud coordinates of the characteristic points of the object to be detected under different acquisition conditions according to the coordinate transformation matrix.

In a second aspect, an embodiment of the present application provides an object detecting and tracking apparatus, including:

the extraction unit is used for acquiring a plurality of images of the object to be detected, which are acquired under different acquisition conditions, respectively extracting characteristic points of the images, and extracting characteristic descriptors of the extracted characteristic points to obtain a plurality of characteristic points of the object to be detected under different acquisition conditions and characteristic descriptors corresponding to the characteristic points;

the coordinate determination unit is used for determining three-dimensional point cloud coordinates of each characteristic point of the object to be detected under different acquisition conditions under a camera coordinate system;

the matching unit is used for matching the feature descriptors obtained by the object to be detected under different acquisition conditions to obtain a plurality of matched feature point pairs;

and the alignment unit is used for determining a coordinate conversion matrix of the characteristic points of the object to be detected among different acquisition conditions according to the three-dimensional point cloud coordinates corresponding to the matched characteristic points, and aligning the three-dimensional point cloud coordinates of the characteristic points of the object to be detected under different acquisition conditions according to the coordinate conversion matrix.

In a third aspect, an embodiment of the present application provides a head-mounted display device, including: a processor, a memory storing computer executable instructions that, when executed by the processor, implement the object detection tracking method of any of the preceding claims.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing one or more programs that, when executed by a head-mounted display device comprising a plurality of application programs, cause the head-mounted display device to perform an object detection and tracking method as in any one of the preceding claims.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: a plurality of images of the object to be detected under different acquisition conditions are obtained, and characteristic points of the object to be detected under different acquisition conditions are obtained by extracting characteristics of the images. The feature descriptors are extracted from the feature points, the feature descriptors under different acquisition conditions are subjected to feature matching, the feature descriptors with high matching degree are used as invariant attribute information of the object to be detected, and invariant characteristics of the object to be detected under different acquisition conditions are captured. In addition, in order to realize that the feature points of the object to be detected can be converted between different acquisition conditions and improve the success probability of subsequent object tracking and re-matching, the feature points can be realized based on the three-dimensional point cloud coordinates of the feature points of the object to be detected, namely, the tracking of the object to be detected under different acquisition conditions is realized through the conversion or alignment of the three-dimensional point cloud coordinates. Therefore, the object detection and tracking method greatly improves the probability of successful follow-up object tracking and re-matching by matching and aligning the characteristic information of the object under different acquisition conditions.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a schematic flow diagram of an object detection tracking method according to an embodiment of the present application;

FIG. 2 illustrates a schematic structural diagram of an object detection and tracking device according to an embodiment of the present application;

FIG. 3 shows a schematic structural diagram of a head mounted display device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the virtual environment modeling process, an important step is object detection, and object feature information which is generally required to be stored in object detection is feature descriptor information of an object under the current ambient light, the information is strongly related to the brightness and darkness of a current image, i.e., a normal camera is dependent on the lighting conditions when capturing the image, when the user wants to continue the experience in the environment, because the conditions such as environmental illumination and the like are changed in a high probability, the object feature information extracted from the image is also changed, at the moment, the currently extracted object feature descriptor is not matched with the feature descriptor extracted during the first environmental modeling, the subsequent detection, tracking and the like of the image are influenced, meanwhile, because the environment modeling needs to be carried out again, the current experience of the user fails, and the interactive experience of the user is greatly reduced.

Fig. 1 is a flowchart of an object detection and tracking method according to an embodiment of the present disclosure, and referring to fig. 1, the object detection and tracking method according to the embodiment of the present disclosure includes the following steps S110 to S140:

step S110, acquiring a plurality of images of the object to be detected acquired under different acquisition conditions, respectively performing feature point extraction on each image, and performing feature descriptor extraction on the extracted feature points to obtain a plurality of feature points of the object to be detected under different acquisition conditions and feature descriptors corresponding to each feature point.

The images of the object to be detected under different acquisition conditions can be acquired by using all-in-one headset devices such as VR/AR/MR devices, two or more environment capturing and tracking cameras are built in the devices, and 6DoF (6Degree of Freedom, six degrees of Freedom) information of a head-wearing end can be acquired in real time.

After confirming an object to be detected in a real environment scene, a user carries out 360-degree all-directional scanning on the object to be detected in the environment through a built-in environment tracking camera on a head-mounted all-in-one machine device, the built-in environment tracking camera of the head-mounted all-in-one machine device can scan the object to be detected without dead angles as much as possible, the image scanning process can be carried out under various different acquisition conditions, the acquisition conditions can refer to environment illumination conditions and the like, for example, the same object is scanned under three different environment illumination conditions of morning, noon and evening, and then a plurality of images acquired by the object to be detected under different illumination conditions are obtained.

After obtaining a plurality of images of an object to be detected under different acquisition conditions, Feature point extraction may be performed on each image by using a Feature extraction algorithm, and there are many conventional algorithms for extracting Feature points of the object, such as FAST from accessed Segment Test (Feature acquisition acceleration) algorithm, Scale-Invariant Feature Transform (SIFT-Invariant Feature Transform) algorithm, speeded Up Robust Feature algorithm (SURF) and the like. The FAST algorithm detects a circle of pixel values around the candidate feature point based on the gray value of the image around the feature point, and if the gray value difference between enough pixel points in the field around the candidate point and the candidate point is large enough, the candidate point is considered as a feature point. The SIFT algorithm is a descriptor used in the field of image processing, has scale invariance, can detect key points in an image, and is a local feature descriptor. The above feature extraction methods are all conventional methods in the art, and the embodiments of the present application are not described in detail. Based on the balance consideration of algorithm complexity and detection precision, the method and the device for extracting the object feature points can use the FAST algorithm to extract the object feature points.

After the feature points of each image are extracted, corresponding feature descriptors are further extracted for each feature point, the feature descriptors are local features (such as edges, corners, contours and the like) of the detected image, feature combination and transformation are performed according to the needs of a matching target to form feature vectors which are easy to match and good in stability, so that the image matching problem is converted into the feature matching problem, the feature descriptors of the object feature points represent meaning, attribute information and the like of the feature points, and the feature descriptors can be used for explaining the feature points and serve as a basis for tracking the object subsequently.

And step S120, determining three-dimensional point cloud coordinates of each characteristic point of the object to be detected under different acquisition conditions in a camera coordinate system.

After obtaining the information of each feature point of the object to be detected under different acquisition conditions, the three-dimensional point cloud coordinates of each feature point under the camera coordinate system can be calculated by utilizing the computer stereo imaging technology, the triangulation positioning principle, calibration parameters of a camera built in the head-mounted device and the like, and the feature descriptors of each feature point on the object and the corresponding three-dimensional point cloud coordinates under the camera coordinate system are stored in an associated manner.

Step S130, matching the feature descriptors obtained by the object to be detected under different acquisition conditions to obtain a plurality of matched feature point pairs.

After the feature descriptors of the object to be detected under different acquisition conditions are obtained, the relevance or matching degree between the feature descriptors under different acquisition conditions needs to be determined, so that feature descriptors obtained under different acquisition conditions can be matched by using a feature point matching algorithm in the prior art, and a plurality of matched feature point pairs are obtained.

The conventional feature point matching method has a hamming distance matching algorithm, wherein the hamming distance is named by the name of richard-weisley-hamming, and the hamming distance between two character strings with equal length is the number of different characters at the corresponding positions of the two character strings, i.e. the number of characters to be replaced when one character string is converted into the other character string. The KNN (K-Nearest Neighbor classification ) matching algorithm is used for finding K records closest to new data from a training set and then determining the category of the new data according to the main classification of the K records. The RANSAC (RANdom SAmple Consensus) matching algorithm randomly samples matching samples and finds out consistent SAmple points. The feature point matching algorithms are all conventional matching algorithms in the field, and those skilled in the art can flexibly select the feature point matching algorithms according to actual situations, and are not specifically limited herein.

It should be noted that, the foregoing step S120 and step S130 are not in a strict sequence, and in the specific implementation, a person skilled in the art may flexibly adjust the sequence according to the actual situation, and should not be construed as a limitation to the protection scope of the embodiment of the present application.

And step S140, determining a coordinate transformation matrix of the feature points of the object to be detected among different acquisition conditions according to the three-dimensional point cloud coordinates corresponding to the matched feature points, and aligning the three-dimensional point cloud coordinates of the feature points of the object to be detected under different acquisition conditions according to the coordinate transformation matrix.

The characteristic descriptors are corresponding to the characteristic points, after a plurality of matched characteristic point pairs are obtained, the corresponding characteristic points can be determined according to the characteristic descriptors in the matched characteristic point pairs, and then the three-dimensional point cloud coordinate corresponding to each matched characteristic point pair is obtained.

According to the object detection and tracking method, the characteristic information of the object under different acquisition conditions is matched and aligned, so that the probability of successful follow-up object tracking and re-matching is greatly improved.

In one embodiment of the present application, the feature descriptor extraction of the extracted feature points includes: determining a corresponding neighborhood window of each feature point in each image; calculating the pixel gradient value of each characteristic point according to the gray value corresponding to each characteristic point and the gray value of each pixel point in the neighborhood window; and normalizing the pixel gradient value of each feature point to obtain a feature descriptor corresponding to each feature point.

In a real life scene, when the same object is observed from different distances, different directions and angles and under different illumination conditions, the size, the shape, the brightness and the like of the object are different, but the brain of the user can still judge that the object is the same object. Ideally, the feature descriptors should have such properties that in images with different sizes, directions and light and shade, the same feature point should have sufficiently similar descriptors, which is called the reproducibility of the descriptors. The same result should be achieved when the descriptors are computed separately in some ideal way, i.e. the descriptors should be insensitive to illumination (brightness), have scale consistency (size), rotational consistency (angle), etc. Therefore, the purpose of extracting the feature descriptor in the embodiment of the present application is to describe the feature point by using a group of vectors after obtaining the feature point, and the descriptor includes not only the feature point but also pixel points around the feature point contributing to the feature point, so that the feature point has more invariant characteristics, such as illumination change, three-dimensional viewpoint change, and the like, and is used as a basis for subsequent target matching and tracking.

The embodiment of the application can adopt a regional gradient extraction algorithm to extract the feature descriptors, specifically, a window with the size of 5 × 5 in the neighborhood can be determined for each feature point, the pixel gradient value is calculated according to the gray value of each pixel point in the window and the gray value of the pixel corresponding to the feature point, all the pixel points in the window are traversed in sequence, the pixel gradient value of all the windows can be obtained, then the pixel gradient value is normalized, a unique vector is generated, and the vector is an abstraction of the feature point information and has uniqueness. Of course, other algorithms can be used by those skilled in the art to extract the feature descriptors, which are not listed here.

In an embodiment of the present application, the multiple images acquired by the object to be detected under different acquisition conditions include a first sub-image acquired by the left eye camera and a second sub-image acquired by the right eye camera, and determining three-dimensional point cloud coordinates of each feature point of the object to be detected under different acquisition conditions in the camera coordinate system includes: determining a first imaging position corresponding to a characteristic point on an object to be detected in a first sub-image and a second imaging position corresponding to the characteristic point on the object to be detected in a second sub-image by utilizing a normalized cross-correlation matching algorithm according to calibration parameters of a binocular camera; and calculating the three-dimensional point cloud coordinates of each characteristic point of the object to be detected under different acquisition conditions in a camera coordinate system by utilizing a three-dimensional triangulation algorithm according to the relative position relation between the first imaging position and the second imaging position.

During specific implementation, the embodiment of the application can adopt the head-mounted equipment which is internally provided with two environment capturing tracking cameras, namely a binocular camera, to acquire images, wherein the acquired images specifically comprise a first sub-image acquired by a left eye camera and a second sub-image acquired by a right eye camera. After the images acquired by the left and right cameras are obtained, the feature point extraction algorithm can be used for extracting object feature points from the image acquired by one of the left and right cameras, and then according to the calibrated parameters of the binocular camera calibrated in advance, the first imaging position of each feature point on the image acquired by the left camera can be acquired to correspond to the second imaging position on the image acquired by the right camera by using the NCC (Normalized cross correlation) matching algorithm.

The above embodiment only performs object feature point extraction on the image acquired by the left-eye camera, so that subsequent feature point matching errors can be reduced to a certain extent, and certainly, a mode of performing feature point extraction on the image acquired by the left-eye camera and the image acquired by the right-eye camera and then performing matching can be adopted, and specifically, which mode is adopted can be flexibly selected by a person skilled in the art according to actual conditions, and no specific limitation is made here.

And then, according to the relative position relation between the first imaging position and the second imaging position, the three-dimensional point cloud coordinates of each characteristic point of the object to be detected under different acquisition conditions in the camera coordinate system can be calculated by utilizing a three-dimensional triangulation algorithm. Specifically, selecting image features corresponding to the actual physical structure of the object to be detected from the first sub-image; determining corresponding image features of the same physical structure in the second sub-image; and determining the relative position between the two image characteristics to obtain the parallax, and calculating the three-dimensional point cloud coordinates of the characteristic points on the object to be detected in the camera coordinate system according to the parallax. That is, as long as any point on the image acquired by the left-eye camera can find a corresponding matching point on the image acquired by the right-eye camera, the three-dimensional point cloud coordinate of the point can be determined.

In an embodiment of the present application, calibration parameters of the binocular camera are obtained as follows: calibrating a left eye camera and a right eye camera of the binocular camera by using a preset calibration algorithm; and determining calibration parameters of the left-eye camera and the right-eye camera according to the calibration result, wherein the calibration parameters comprise internal parameters and distortion parameters of the left-eye camera and the right-eye camera and a rotation and translation matrix of the left-eye camera relative to the right-eye camera.

In the scenes of object detection and the like, in order to determine the correlation between the three-dimensional geometric position of a certain point on the surface of an object in space and the corresponding point in an image, a geometric model of camera imaging needs to be established, and the geometric model parameters are camera parameters. Under most conditions, the parameters (internal parameters, external parameters, distortion parameters, etc.) can be obtained through experiments and calculation, and the process of solving the parameters is called camera calibration (or video camera calibration). In image measurement or machine vision application, calibration of camera parameters is a very critical link, and the accuracy of a calibration result and the stability of an algorithm directly influence the accuracy of a result generated by the operation of a camera. Therefore, before the feature point matching of the object image to be detected is carried out, the binocular camera is calibrated, so that the accuracy of subsequent image detection and matching is improved.

Specifically, there are various calibration methods for the video camera, such as a traditional zhangying calibration method, an active vision camera calibration method, a camera self-calibration method, and the like, and the calibration parameters, including internal parameters and distortion parameters of the left-eye camera and the right-eye camera, and a rotation and translation matrix of the left-eye camera relative to the right-eye camera, can be obtained by calibrating the video camera with these calibration algorithms. The specific calibration method is adopted here, and those skilled in the art can flexibly select the calibration method according to actual situations, and is not limited specifically here.

In an embodiment of the present application, the feature descriptors include a plurality of first feature descriptors corresponding to the object to be detected obtained under the first acquisition condition and a plurality of second feature descriptors corresponding to the object to be detected obtained under the second acquisition condition, and matching the feature descriptors obtained by the object to be detected under different acquisition conditions to obtain a plurality of matched feature point pairs includes: calculating the matching degree between each first feature descriptor and each second feature descriptor by using a preset feature matching algorithm; and forming a matching feature point pair by the first feature descriptor and the second feature descriptor with the matching degree reaching a preset threshold value.

The process of matching feature descriptors in the embodiment of the present application is substantially a process of matching a plurality of feature descriptors corresponding to an object to be detected, which are obtained under any two acquisition conditions, with each other, and here, for the purpose of distinguishing, a first acquisition condition and a second acquisition condition are used for distinguishing. For example, the first acquisition condition may be morning, the second acquisition condition may be evening, the feature descriptors corresponding to the object image to be detected acquired in morning are respectively matched with the feature descriptors corresponding to the object image acquired in evening, the above-mentioned KNN matching algorithm may be adopted here to obtain the similarity between any two feature descriptors, and finally, the feature descriptors with the similarity in the top N bits may form N matching feature point pairs. Although the feature descriptors on the same real object under different lighting conditions are different in large probability, according to the test of the feature descriptors of a plurality of objects, the number of feature points on each object is more than 120 on average, and some feature descriptors are similar. In the embodiment of the present application, based on consideration of algorithm complexity, accuracy, and the like, the number N of the matched feature point pairs may be 20. Of course, those skilled in the art may select other numbers of matching feature point pairs, which are not specifically limited herein.

In an embodiment of the present application, determining a coordinate transformation matrix of feature points of an object to be detected between different acquisition conditions according to three-dimensional point cloud coordinates corresponding to each matching feature point pair includes: acquiring three-dimensional point cloud coordinates of feature points corresponding to each first feature descriptor in the matched feature point pair and three-dimensional point cloud coordinates of feature points corresponding to each second feature descriptor; and calculating a coordinate conversion matrix between the three-dimensional point cloud coordinates of the characteristic points corresponding to the first characteristic descriptor and the three-dimensional point cloud coordinates of the characteristic points corresponding to the second characteristic descriptor by using a perspective N-point algorithm.

The method aims to convert the three-dimensional point cloud coordinate of the object feature point under the second acquisition condition into the three-dimensional point cloud coordinate system corresponding to the object feature point under the first acquisition condition through the relationship between the camera solid geometry imaging and the perspective projection, and calculate the coordinate conversion matrix. In the specific calculation of the coordinate transformation matrix, the following formula may be adopted:

wherein the content of the first and second substances,

representing that the feature point of the object to be detected corresponds to the three-dimensional point cloud coordinate in the camera coordinate system under the first acquisition condition;

representing that the feature point of the object to be detected corresponds to the three-dimensional point cloud coordinate in the camera coordinate system under the second acquisition condition; Δ T denotes the coordinate system of the camera

Relative to each other

The rotational translation matrix of (a).

Herein, the

And

the three-dimensional Point cloud coordinates of the feature points corresponding to the 20 matched feature Point pairs obtained in the above embodiment may be calculated by a PNP (peer-to-peer-Point) algorithm to obtain a coordinate transformation matrix of the feature points of the image of the object to be detected between the first acquisition condition and the second acquisition condition.

In one embodiment of the present application, aligning three-dimensional point cloud coordinates of feature points of an object to be detected under different acquisition conditions according to a coordinate transformation matrix includes: and converting the three-dimensional point cloud coordinates of the characteristic points of the object to be detected, which are obtained under the second acquisition condition, into the three-dimensional point cloud coordinates of the characteristic points of the object to be detected, which correspond to the first acquisition condition, according to the coordinate conversion matrix.

After the coordinate transformation matrix is obtained, the three-dimensional point cloud coordinates of each characteristic point of the object to be detected corresponding to the two acquisition conditions can be transformed or aligned. It should be noted that the solution of the present application is not limited to the above-mentioned coordinate transformation or alignment of the feature points between two collection conditions, and may also be extended to the coordinate transformation or alignment of the feature points between two or more collection conditions, for example, the feature points of the images collected under three illumination conditions, namely morning, midday, and evening, may be aligned in terms of coordinate matching and aligning the feature points of the images collected under any two illumination conditions with the feature points of the images collected under another illumination condition, or may be aligned in terms of two pairs, as long as the coordinates of the aligned feature points are ensured to reflect the attribute information of the object to be detected under different collection conditions.

For example, for three-dimensional point cloud coordinates of images acquired under three different illumination conditions, assuming that an object to be detected has 150 feature points on average under each illumination condition, the three-dimensional point cloud coordinates corresponding to 450 feature points of the object to be detected are finally obtained through the coordinate alignment process, which greatly improves the probability of successful tracking and re-matching of subsequent objects.

The object detection and tracking method and the object detection and tracking method belong to the same technical concept, and the embodiment of the application also provides an object detection and tracking device which is applied to the head-mounted display equipment. Fig. 2 is a block diagram of an object detection and tracking device according to an embodiment of the present application, and referring to fig. 2, the object detection and tracking device 200 includes: an extraction unit 210, a coordinate determination unit 220, a matching unit 230, and an alignment unit 240.

The extraction unit 210 in this embodiment is configured to acquire a plurality of images of an object to be detected acquired under different acquisition conditions, respectively extract feature points of each image, and extract feature descriptors of the extracted feature points, so as to obtain a plurality of feature points of the object to be detected under different acquisition conditions and feature descriptors corresponding to each feature point.

The coordinate determination unit 220 in the embodiment of the application is configured to determine three-dimensional point cloud coordinates of each feature point of the object to be detected under different acquisition conditions in the camera coordinate system.

The matching unit 230 of the embodiment of the application is configured to match feature descriptors obtained by an object to be detected under different acquisition conditions to obtain a plurality of matched feature point pairs.

The alignment unit 240 in this embodiment of the application is configured to determine a coordinate transformation matrix of the feature point of the object to be detected between different acquisition conditions according to the three-dimensional point cloud coordinate corresponding to each matching feature point pair, and align the three-dimensional point cloud coordinate of the feature point of the object to be detected under different acquisition conditions according to the coordinate transformation matrix.

In an embodiment of the present application, the extracting unit 210 is further configured to: determining a corresponding neighborhood window of each feature point in each image; calculating the pixel gradient value of each characteristic point according to the gray value corresponding to each characteristic point and the gray value of each pixel point in the neighborhood window; and normalizing the pixel gradient value of each feature point to obtain a feature descriptor corresponding to each feature point.

In an embodiment of the present application, the multiple images of the object to be detected collected under different collection conditions include a first sub-image collected by the left-eye camera and a second sub-image collected by the right-eye camera, and the coordinate determination unit 220 is further configured to: determining a first imaging position corresponding to a characteristic point on an object to be detected in a first sub-image and a second imaging position corresponding to the characteristic point on the object to be detected in a second sub-image by utilizing a normalized cross-correlation matching algorithm according to calibration parameters of a binocular camera; and calculating the three-dimensional point cloud coordinates of each characteristic point of the object to be detected under different acquisition conditions in a camera coordinate system by utilizing a three-dimensional triangulation algorithm according to the relative position relation between the first imaging position and the second imaging position.

In an embodiment of the application, the feature descriptors include a plurality of first feature descriptors corresponding to the object to be detected obtained under the first collecting condition and a plurality of second feature descriptors corresponding to the object to be detected obtained under the second collecting condition, and the matching unit 230 is further configured to: calculating the matching degree between each first feature descriptor and each second feature descriptor by using a preset feature matching algorithm; and forming a matching feature point pair by the first feature descriptor and the second feature descriptor with the matching degree reaching a preset threshold value.

In one embodiment of the present application, the alignment unit 240 is further configured to: acquiring three-dimensional point cloud coordinates of feature points corresponding to each first feature descriptor in the matched feature point pair and three-dimensional point cloud coordinates of feature points corresponding to each second feature descriptor; and calculating a coordinate conversion matrix between the three-dimensional point cloud coordinates of the characteristic points corresponding to the first characteristic descriptor and the three-dimensional point cloud coordinates of the characteristic points corresponding to the second characteristic descriptor by using a perspective N-point algorithm.

In one embodiment of the present application, the alignment unit 240 is further configured to: and converting the three-dimensional point cloud coordinates of the characteristic points of the object to be detected, which are obtained under the second acquisition condition, into the three-dimensional point cloud coordinates of the characteristic points of the object to be detected, which correspond to the first acquisition condition, according to the coordinate conversion matrix.

It should be noted that the object detecting and tracking device can implement the steps of the object detecting and tracking method provided in the foregoing embodiment, and the related explanations about the object detecting and tracking method are applicable to the object detecting and tracking device, and are not described herein again.

In conclusion, according to the technical scheme of the application, a plurality of images of the object to be detected under different acquisition conditions are obtained, and the characteristic points of the object to be detected under different acquisition conditions are obtained by extracting the characteristics of the images. The feature descriptors are extracted from the feature points, the feature descriptors under different acquisition conditions are subjected to feature matching, the feature descriptors with high matching degree are used as invariant attribute information of the object to be detected, and invariant characteristics of the object to be detected under different acquisition conditions are captured. In addition, in order to realize that the feature points of the object to be detected can be converted between different acquisition conditions and improve the success probability of subsequent object tracking and re-matching, the feature points can be realized based on the three-dimensional point cloud coordinates of the feature points of the object to be detected, namely, the tracking of the object to be detected under different acquisition conditions is realized through the conversion or alignment of the three-dimensional point cloud coordinates. Therefore, the method and the device greatly improve the probability of successful follow-up object tracking and re-matching by matching and aligning the characteristic information of the object under different acquisition conditions.

It should be noted that:

fig. 3 illustrates a schematic structural diagram of the head-mounted display device. Referring to fig. 3, at a hardware level, the head-mounted display device includes a memory and a processor, and optionally further includes an interface module, a communication module, and the like. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory. Of course, the head mounted display device may also include hardware needed for other services.

The processor, the interface module, the communication module, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus.

A memory for storing computer executable instructions. The memory provides computer executable instructions to the processor through the internal bus.

A processor executing computer executable instructions stored in the memory and specifically configured to perform the following operations:

The functions performed by the object detection and tracking device according to the embodiment shown in fig. 2 of the present application may be implemented in or by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The head-mounted display device may further perform the steps performed by the object detection and tracking method in fig. 1, and implement the functions of the object detection and tracking method in the embodiment shown in fig. 1, which are not described herein again in this embodiment of the present application.

Embodiments of the present application also provide a computer-readable storage medium storing one or more programs, which when executed by a processor, implement the foregoing method, and are specifically configured to perform:

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) characterized by computer-usable program code.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An object detection tracking method, comprising:

2. The object detection and tracking method according to claim 1, wherein the feature descriptor extraction of the extracted feature points comprises:

determining a corresponding neighborhood window of each feature point in each image;

calculating the pixel gradient value of each characteristic point according to the gray value corresponding to each characteristic point and the gray value of each pixel point in the neighborhood window;

and normalizing the pixel gradient value of each feature point to obtain a feature descriptor corresponding to each feature point.

3. The object detection and tracking method according to claim 1, wherein the plurality of images acquired by the object to be detected under different acquisition conditions include a first sub-image acquired by a left-eye camera and a second sub-image acquired by a right-eye camera, and the determining the three-dimensional point cloud coordinates of the feature points of the object to be detected under different acquisition conditions in the camera coordinate system includes:

determining a first imaging position corresponding to the feature point on the object to be detected in the first sub-image and a second imaging position corresponding to the feature point on the object to be detected in the second sub-image by utilizing a normalized cross-correlation matching algorithm according to calibration parameters of a binocular camera;

and calculating the three-dimensional point cloud coordinates of each characteristic point of the object to be detected under different acquisition conditions in a camera coordinate system by using a three-dimensional triangulation algorithm according to the relative position relationship between the first imaging position and the second imaging position.

4. The object detection and tracking method according to claim 3, wherein the calibration parameters of the binocular camera are obtained by:

calibrating a left eye camera and a right eye camera of the binocular camera by using a preset calibration algorithm;

and determining calibration parameters of the left-eye camera and the right-eye camera according to a calibration result, wherein the calibration parameters comprise internal parameters and distortion parameters of the left-eye camera and the right-eye camera and a rotation and translation matrix of the left-eye camera relative to the right-eye camera.

5. The object detecting and tracking method according to claim 1, wherein the feature descriptors include a plurality of first feature descriptors corresponding to the object to be detected obtained under a first acquisition condition and a plurality of second feature descriptors corresponding to the object to be detected obtained under a second acquisition condition, and the matching the feature descriptors obtained by the object to be detected under different acquisition conditions to obtain a plurality of matched feature point pairs includes:

calculating the matching degree between each first feature descriptor and each second feature descriptor by using a preset feature matching algorithm;

and forming the matching feature point pair by the first feature descriptor and the second feature descriptor with the matching degree reaching a preset threshold value.

6. The object detection and tracking method according to claim 5, wherein the determining the coordinate transformation matrix of the feature points of the object to be detected between different acquisition conditions according to the three-dimensional point cloud coordinates corresponding to each matching feature point pair comprises:

acquiring three-dimensional point cloud coordinates of feature points corresponding to each first feature descriptor and three-dimensional point cloud coordinates of feature points corresponding to each second feature descriptor in the matched feature point pair;

and calculating a coordinate conversion matrix between the three-dimensional point cloud coordinates of the feature points corresponding to the first feature descriptor and the three-dimensional point cloud coordinates of the feature points corresponding to the second feature descriptor by using a perspective N-point algorithm.

7. The object detecting and tracking method according to claim 6, wherein the aligning the three-dimensional point cloud coordinates of the feature points of the object to be detected under different acquisition conditions according to the coordinate transformation matrix comprises:

and converting the three-dimensional point cloud coordinates of the characteristic points of the object to be detected, which are obtained under the second acquisition condition, into the three-dimensional point cloud coordinates of the characteristic points of the object to be detected, which correspond to the first acquisition condition, according to the coordinate conversion matrix.

8. An object detection tracking device, comprising:

9. The object detecting and tracking method according to claim 1, wherein the feature descriptors include a plurality of first feature descriptors corresponding to the object to be detected obtained under a first acquisition condition and a plurality of second feature descriptors corresponding to the object to be detected obtained under a second acquisition condition, and the matching unit is specifically configured to:

10. A head-mounted display device, comprising: a processor, a memory storing computer-executable instructions,

the executable instructions, when executed by the processor, implement the object detection tracking method of any one of claims 1 to 7.