CN110189267B

CN110189267B - Real-time positioning device and system based on machine vision

Info

Publication number: CN110189267B
Application number: CN201910420567.7A
Authority: CN
Inventors: 仇永生; 吴加佳; 王勇; 梁志伟
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2020-12-15
Anticipated expiration: 2039-05-21
Also published as: CN110189267A

Abstract

The invention discloses a real-time positioning device and a system based on a machine vision system, wherein the device comprises: the device comprises an FPGA hardware logic module and a high-speed DSP floating point calculation module. The system comprises a stereoscopic vision camera, a VR helmet unit, a stereoscopic vision processing board and a computer. According to the invention, the position of the MARK point is identified through the image, and the MARK point is subjected to synchronous control design, so that the real-time response and the reliability of the system are improved.

Description

Real-time positioning device and system based on machine vision

Technical Field

The invention belongs to the field of machine vision, and particularly relates to a real-time positioning device and system based on machine vision.

Background

Machine vision aims to identify and understand content in images/videos with computers. The positioning and attitude-fixing technology is a technology for digitizing the position and attitude of motion by physical means such as inertia, optics and the like, and belongs to a branch of machine vision. The current mainstream motion capture technologies include optical motion capture technologies and inertial motion capture technologies. The mainstream optical motion capture technology research in the world mainly comprises Oxford Metrics Limited company, MotionNalisis company and Natural Point company in the UK, which respectively develop a Vicon system, a Raptor system and an OptiTrack system, wherein the tracking frequency of the systems is more than 60Hz, and the tracking precision can reach the millimeter level. The research on the inertial kinetic capture technology mainly comprises the Holland Xsens company and the Beijing Nordic technologies, which respectively research and develop an Xsens MVN system and a Legacy system, and the systems can realize multi-person motion capture with 10ms delay and 60Hz frame rate.

The existing motion capture technology is continuously developed and perfected, the requirements of people on the usability, the calculation precision, the real-time property, the economy and the like of positioning and posture fixing are continuously increased, and the optical vision positioning technology and the inertial positioning and posture fixing technology have advantages and disadvantages respectively and are difficult to meet all requirements. Although the optical vision positioning technology has high position precision, the optical vision positioning technology has poor usability and high use cost and can not directly measure the posture. The inertial positioning technology does not use space limitation, can directly measure acceleration and angular velocity, but has low position precision and is easily interfered by an external magnetic field. The combination of various technical means, especially measurement means mainly based on inertia and vision, is a main development direction of future positioning and attitude determination technology.

However, there are many difficulties in the prior art that have not yet been overcome. For example, a vision sensor has no way to work for non-textured areas. An Inertial Measurement Unit (IMU) can measure angular velocity and acceleration through a built-in gyroscope and accelerometer to estimate the attitude of the camera, but the estimated attitude has accumulated errors. How to fuse the measurement information of the vision sensor and the IMU to realize real-time positioning is an urgent technical problem to be solved in the industry.

Disclosure of Invention

The system is based on VIO (Visual Inertial odometer) technology, and MARK points and MARK synchronous control design are added on the basis of the VIO technology, so that real-time response and reliability of the system are improved. The technical scheme provided by the application is as follows:

a real-time positioning device based on a machine vision system, comprising: the FPGA hardware logic module and the high-speed DSP floating point calculation module;

the FPGA hardware logic module comprises a video acquisition preprocessing module, a first group manager and a MARKi synchronous control module; video data acquired by a stereoscopic vision camera is input into a video acquisition preprocessing module; the output data of the video acquisition preprocessing module is sent to a first group manager; the first group manager is controlled by a MARKi synchronous control module; the MARKi synchronous control module is connected to a real-time control bus of the system through a real-time control bus communication protocol;

the high-speed DSP floating point calculation module comprises a system calibration module, a calibration parameter module, a second group manager and a VIO information fusion module which are sequentially connected;

the first group of managers comprises a synchronization frame MARKi image taking module, a MARKi feature extraction (SURF algorithm) module, a MARKi image matching module and a MARKi image coordinate calculating module;

the second group of manager modules comprise a MARKi space coordinate resolving module, an HTi space coordinate and attitude resolving module and an IMUi attitude information module;

data of a MARKi image coordinate calculating module in the first group of managers is sent to a MARKi space coordinate calculating module in the second group of managers; the MARKi space coordinate resolving module in the second group of managers sends the processed data to the HTi space coordinate and attitude resolving module; the IMUi attitude information module acquires IMUi attitude information from a system real-time control bus; the VIO information fusion module simultaneously acquires output data of the HTi space coordinate and attitude calculation module and the IMUi attitude information module; and the VIO information fusion module outputs the reported information.

A real-time positioning system based on a machine vision system, comprising: a stereoscopic vision camera, a VR headset unit, a stereoscopic vision processing board and a computer, the stereoscopic vision processing board comprising a real-time positioning device based on a machine vision system as claimed in claim 1.

According to one aspect of the application, the system sends data collected by an IMU module and a MARK synchronizer arranged on each VR helmet unit to a stereoscopic vision processing board through a real-time control bus of the system; the stereoscopic vision processing board obtains the video acquisition information through the stereoscopic vision camera and fuses IMU information obtained by the IMU module to obtain pose information.

According to one aspect of the application, the system filters digital images acquired by a stereoscopic vision camera through a fast median filtering algorithm.

According to one aspect of the application, the system takes a MARKi image of a synchronous frame to perform Feature extraction, integrates the Feature region image by using a Speeded Up Robust Feature (SURF) algorithm, performs convolution operation to obtain harr features of a MARK point, and calculates multi-dimensional vectors and principal Feature vectors of the harr features to form a Feature descriptor of the MARK point.

According to one aspect of the application, the system adopts Euclidean distance as similarity measurement, one feature point in an image is taken, N feature points which are nearest neighbor in reference Euclidean distance are found according to a threshold value, wherein N is a natural number, comparison is carried out, and if more than 60% of feature points are smaller than the threshold value, the feature points are considered to be matched; and resolving the three-dimensional space position of the MARK point according to the relation between the image coordinate and the space coordinate of the matching point.

According to one aspect of the application, the stereoscopic vision camera is a binocular camera assembly.

According to one aspect of the application, the stereoscopic vision processing board sends the pose information to the computer through the USB interface.

Compared with the prior art, the invention has the following characteristics:

1) FPGA and DSP based rapid SURF algorithm

The feature extraction algorithm adopts a SURF algorithm with robust features, and improves harr feature operators, so that the feature region images can be quickly integrated in FPGA hardware logic, while convolution operation related to floating points is completed in a DSP.

2) Fast feature matching based on FPGA and synchronization technology

Image feature recognition and matching are technical difficulties in stereo measurement and are also a main part influencing system delay. The algorithms are realized by adopting FPGA hardware logic, so that the reaction time of the system is greatly improved, the video frame rate is 24 ms/frame, and meanwhile, the resource pressure brought to the FPGA by distinguishing the characteristic points ID is reduced by adopting a synchronization technology, so that the processing time is less than 100ms while the multi-target three-dimensional attitude calculation can be processed, and the real-time requirement is met.

According to the invention, the precision of the attitude angle is less than 0.01 degree within the range of not less than 1 meter, and the precision of the displacement is less than 1 mm. And the target number is not less than 5, and the corresponding time is less than 100 ms. High-precision real-time data can be provided for the virtual simulation scene.

Drawings

FIG. 1 is a schematic diagram of the overall system scheme;

FIG. 2 is a schematic diagram of a MARK point synchronization control flow;

FIG. 3 is a block diagram of the composition of a stereoscopic processing panel;

FIG. 4 is a schematic diagram of a binocular stereo vision technique;

FIG. 5 is a schematic diagram of a feature descriptor;

fig. 6 is a software processing module configuration diagram of system data.

Detailed Description

The technical content of the invention is further explained by combining the embodiment; the following examples are illustrative and not intended to be limiting, and are not intended to limit the scope of the invention.

1. Overall system scheme

As shown in fig. 1, the system components include a stereoscopic vision camera, a VR headset unit, a stereoscopic vision processing board, and a computer. Wherein the VR helmet unit includes an IMU module, a MARK synchronizer. The computer includes a processor and a memory. Preferably, the stereoscopic camera is a binocular camera assembly.

And arranging N VR helmets in the visual range of the stereoscopic vision camera, wherein N is a natural number. And the data collected by the IMU module and the MARK synchronizer on each VR helmet unit is sent to a stereoscopic vision processing board through a real-time control bus of the system. The stereoscopic vision processing board obtains video acquisition information through a stereoscopic vision camera. The stereoscopic vision processing board sends the pose information to the computer through the USB interface.

A MARK point with identifiable characteristics, i.e., MARK1, MARK2 · MARKi, is placed on each VR helmet. The image can be accurately and stably captured, the image processing pressure is reduced, and the reliability of the system is improved.

Because the accumulated error of the accelerometer is large, the IMU attitude accuracy can meet the application. The system thus uses the pose angle information of the IMU to assist in the pose measurement for stereoscopic vision. And (3) carrying out high-precision positioning and attitude determination on the VR helmet attitude by adopting a stereoscopic vision technology, and carrying out attitude auxiliary fusion processing by adopting a micro inertial sensor IMU.

In the existing DSP image processing technology, when a plurality of feature points of a plurality of targets are processed, the feature points are identified, feature extraction, stereo matching and the like, so that the resource consumption is very high, real-time calculation cannot be realized, and image processing cannot be completed within 24ms (video frame interval time), thereby causing video response jamming. The present application uses FPGA hardware logic to accomplish these tasks.

Specifically, the system adopts a MARK point synchronization control technology. And generating a synchronization signal by the FPGA, and sending the synchronization signal to a synchronization controller of the MARK point. The VR helmet MARK point is identified and processed by the FPGA, and calculation can be completed within 5ms according to past results. And at this moment, the ID of each VR helmet is known, so that the identification misjudgment and resource consumption of a plurality of VR helmets processed simultaneously are avoided. The flow of MARK point synchronization control is shown in fig. 2.

2. Hardware platform

As shown in fig. 1-3, the hardware platform of the present invention includes a binocular camera assembly, a VR headset, a stereoscopic vision processing board, and a computer. As shown in fig. 3, the stereoscopic vision processing board is a stereoscopic vision processing board configured based on an FPGA + DSP platform. The stereo vision processing plate comprises the following main parts: the device comprises a video acquisition circuit, an FPGA logic processing module, a DSP floating point operation module and a USB interface. The FPGA logic processing module and the DSP floating point operation module are simultaneously connected to a real-time control bus of the system. The stereo vision processing board has a main stereo measurement function, and only the visual positioning calculation software needs to be modified according to different characteristics of the light spots.

The FPGA is used as a logic circuit of high-capacity and flexible hardware with programmable logic, integrates a plurality of hardware resources, such as an embedded DSP block, an embedded RAM block, a high-speed external memory interface (DDR) and the like, and a logic processing platform, is very suitable for real-time processing of high-performance videos and images, can realize the parallel processing capability of the images, and mainly completes filtering, stereo matching, feature point marking and tracking algorithms.

The DSP uses a high performance 32-bit floating point DSP chip. The DSP adopts a Harvard bus structure, has a password protection mechanism, and can perform double 16 multiplied by 16 and 32 multiplied by 32 addition operations in one period, thereby having double functions of control and quick operation. The calculation speed is incomparable with that of a general-purpose computer and a single chip microcomputer. The method mainly completes the three-dimensional positioning calculation and the attitude matrix model calculation.

The stereoscopic vision processing board adopts a USB interface to output the attitude information.

3. Software processing of system data

The software processing process of the system data is based on the binocular stereo vision technology. Binocular Stereo Vision (Binocular Stereo Vision) consists of left and right cameras. As shown in fig. 4, the corresponding parameters of the left and right cameras are denoted by the subscripts l and r, respectively. The image points of a point A (X, Y, Z) in the world space on the imaging planes Cl and Cr of the left and right cameras are al (ul, vl) and ar (ur, vr), respectively. These two image points are images of the same object point a in world space, and are called "conjugate points". Knowing these two conjugate image points, we respectively make their connection lines with the optical centers Ol and Or of the respective cameras, i.e. the projection lines alOl and arOr, and their intersection points are the object points a (X, Y, Z) in world space.

The system software processing process comprises image acquisition preprocessing, feature extraction and image matching. The corresponding software processing module comprises an FPGA hardware logic module and a high-speed DSP floating point calculation module. The FPGA hardware logic module comprises a video acquisition preprocessing module, a first group of managers and a MARKi synchronous control module. The high-speed DSP floating point calculation module comprises a system calibration module, a calibration parameter module, a second group of managers and a VIO information fusion module.

Image acquisition preprocessing:

the acquisition of digital images is a source of information for stereoscopic vision. When a spatial three-dimensional scene is transformed into a two-dimensional image through Perspective Projection (Perspective Projection), the imaging of the same scene on camera image planes of different viewpoints can be distorted and deformed to different degrees, and the influences of various factors such as the illumination condition in the scene, the geometric shape and surface characteristics of a detected object, noise interference and distortion, camera characteristics and the like are collectively reflected in a single image gray value, and the image needs to be further filtered, such as unclear or distorted, which is not beneficial to image feature extraction.

The invention carries out filtering processing on the digital image acquired by the stereoscopic vision camera through a rapid median filtering algorithm. The edge characteristics of the image are kept, noise can be effectively removed, and meanwhile, the algorithm is designed by adopting an FPGA (field programmable gate array), so that the real-time performance of processing can be guaranteed.

Feature extraction:

and taking the filtered digital image as a synchronous frame MARKi image for feature extraction. The Feature extraction algorithm adopts a speedup Robust Feature (SURF) algorithm. The algorithm integrates the images in the characteristic region, then carries out convolution operation to obtain harr characteristics of the MARK point, and calculates the multidimensional vector and the principal characteristic vector of the harr characteristics to form the characteristic descriptor of the characteristic point. A schematic of the feature descriptors is shown in fig. 5. The center of the left image in fig. 5 represents a feature point. The 64 small squares near the feature points represent pixel points in their local regions. In the image local gradient direction, one or more directions are assigned to the feature points. The length of the arrow represents the gradient modulus of the pixel point, and the direction of the arrow represents the gradient direction of the pixel point. The area where the 64 small squares are located is divided into four equal parts as shown in the left diagram of fig. 5, then gradient direction histograms in eight directions of each module are calculated as shown in the right diagram of fig. 5, and an accumulated value in each gradient direction is obtained to form a feature point. Therefore, each feature point contains coordinate information. The actual sub-matching of the descriptors of the gradient vectors around the MARK point is the subsequent image matching. The SURF algorithm adopted by the feature extraction algorithm can be realized by using an FPGA. Therefore, the robustness of the features of the MARK points can be ensured on the basis of ensuring the real-time performance, and a foundation is laid for the subsequent image matching.

Image matching:

as shown in fig. 4, in the stereoscopic vision, image matching is performed by associating image points al (ul, vl) and ar (ur, vr) of a point a (X, Y, Z) in the three-dimensional space on the imaging planes Cl and Cr of the left and right cameras. I.e. depth information can be obtained from the two-dimensional image. Image matching is the most important and difficult problem in stereo vision.

After the feature points are detected and described, the feature points between the two images are matched, namely, the relationship of the feature points between the images shot by the left camera and the right camera of the binocular camera, namely C1 and C2, is found. When the feature points in the MARKi image are matched, the Euclidean distance is used as a similarity measure, one feature point in the image is taken, N feature points (wherein N is a natural number) nearest to the reference Euclidean distance are found out according to a threshold value, comparison is carried out, and if more than 60% of the feature points are smaller than the threshold value, the feature points are considered to be matched. The software can adjust the threshold value, when the threshold value is set to be reduced to a certain value, the number of the matching points is reduced, and when group judgment can be carried out, the matching is more stable and reliable.

And finally, resolving the three-dimensional space position of the MARK point according to the relation between the image coordinate and the space coordinate of the matching point.

Specifically, as shown in fig. 6, the software processing module of the system data of the present invention includes an FPGA hardware logic module and a high-speed DSP floating point calculation module. The FPGA hardware logic module comprises a video acquisition preprocessing module, a first group of managers and a MARKi synchronous control module. And video data acquired by the binocular camera is input into the video acquisition preprocessing module. And the output data of the video acquisition preprocessing module is sent to the first group manager. The first group manager is controlled by the MARKi synchronization control module. The MARKi synchronous control module is connected to a real-time control bus of the system through a real-time control bus communication protocol. The first group of managers comprises a synchronization frame taking MARKi image module, a MARKi feature extraction (SURF algorithm) module, a MARKi image matching module and a MARKi image coordinate calculating module.

The high-speed DSP floating point calculation module comprises a system calibration module, a calibration parameter module, a second group of managers and a VIO information fusion module. The system calibration module, the calibration parameter module, the second group manager and the VIO information fusion module are sequentially connected. The second group of manager modules comprise a MARKi space coordinate resolving module, an HTi space coordinate and attitude resolving module and an IMUi attitude information module.

And the data of the MARKi image coordinate calculating module in the first group of managers is sent to the MARKi space coordinate calculating module in the second group of managers. And the MARKi space coordinate calculation module in the second group of managers sends the processed data to the HTi space coordinate and attitude calculation module. The MARKi feature points constitute the HTi space. And the IMUi attitude information module acquires IMUi attitude information from a system real-time control bus. And the VIO information fusion module simultaneously acquires the output data of the HTi space coordinate and attitude calculation module and the IMUi attitude information module. And the VIO information fusion module outputs the reported information.

The described embodiments are susceptible to various modifications and alternative forms, and specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the embodiments are not to be limited to the particular forms or methods disclosed, but to the contrary, the intention is to cover all modifications, equivalents, and alternatives.

Claims

1. A real-time positioning device based on a machine vision system, comprising: the FPGA hardware logic module and the high-speed DSP floating point calculation module;

the first group of managers comprises a synchronization frame MARKi image taking module, a MARKi feature extraction module, a MARKi image matching module and a MARKi image coordinate calculating module;

the second group of manager modules comprise a MARKi space coordinate resolving module, an HTi space coordinate and attitude resolving module and an IMUi attitude information module; the MARKi characteristic points form an HTi space;

2. A real-time positioning system based on a machine vision system, comprising: a stereoscopic vision camera, a VR headset unit, a stereoscopic vision processing board and a computer, the stereoscopic vision processing board comprising a real-time positioning device based on a machine vision system as claimed in claim 1.

3. The system of claim 2, wherein: the system sends data acquired by an IMU (inertial measurement unit) and a MARK (MARK synchronization unit) arranged on each VR (virtual reality) helmet unit to a stereoscopic vision processing board through a real-time control bus of the system; the stereoscopic vision processing board obtains the video acquisition information through the stereoscopic vision camera and fuses IMU information obtained by the IMU module to obtain pose information.

4. The system of claim 2, wherein: the system carries out filtering processing on the digital image acquired by the stereoscopic vision camera through a rapid median filtering algorithm.

5. The system of claim 2, wherein: the system takes a synchronization frame MARKi image to carry out feature extraction, integrates the image of a feature region by adopting an SURF algorithm, carries out convolution operation to obtain the harr feature of the MARK point, calculates the multidimensional vector and the main feature vector of the harr feature and forms the feature descriptor of the MARK point.

6. The system of claim 2, wherein: the system adopts Euclidean distance as similarity measurement, one feature point in an image is taken, N feature points which are nearest to the reference Euclidean distance are found out according to a threshold value, wherein N is a natural number, comparison is carried out, and if more than 60% of feature points are smaller than the threshold value, the feature points are considered to be matched; and resolving the three-dimensional space position of the MARK point according to the relation between the image coordinate and the space coordinate of the matching point.

7. The system of claim 2, wherein: the stereoscopic vision camera is a binocular camera assembly.

8. The system of claim 2, wherein: the stereoscopic vision processing board sends the pose information to the computer through the USB interface.