Background
The AR technology has wide application in the fields of development and development of advanced weapons and aircrafts, visualization of data models, virtual training, entertainment, art and the like, and has more obvious advantages than the traditional technology in the fields of medical research and anatomical training, manufacturing and maintenance of precise instruments, navigation of military aircrafts, engineering design, remote robot control and the like due to the characteristic of enhancing display output of real environment.
Wearable intelligence AR equipment for the AR function more closes to human self function and has carried out the expansion, and wear-type AR equipment not only can see object and environment in the physical world for the user provides abundanter visual content, can see the visual content of stack on the object simultaneously moreover, for example the object of virtual generation, the internal organizational structure of same object, the three-dimensional structure picture of object, the three-dimensional effect after other objects stack etc.. The effect realized by the function depends on the generation quality of the superposed visual content, and also depends on the position, the direction and the scale of the superposition, namely whether the visually seamless superposition can be achieved, so that the real object and the superposed visual information are integrated, and the effect is not influenced by the movement of the user.
The positioning technology of the AR head-mounted device needs to identify the environment and obtain real-time three-dimensional spatial coordinates and posture information of the user. The currently adopted positioning technologies are: GPS positioning, WIFI positioning, Ultra Wide Band (UWB) positioning, inertial sensor positioning, machine vision positioning, and the like.
The prior art has the following problems:
1) for indoor positioning, the GPS has great signal attenuation caused by buildings, and the same outdoor precision is difficult to achieve;
2) by adopting Ultra Wide Band (UWB), the precision and concurrency can meet most application requirements, but the deployment and later maintenance costs are high, and the positioning distance is also limited by the deployment of a base station;
3) the inertial sensor is positioned without being influenced by shielding or a base station is not required to be built, the concurrency number has no upper limit, but accumulated errors exist, and the precision often cannot meet the application requirement;
4) machine vision location can receive the field of vision to influence, often adopts a plurality of cameras to gather simultaneously to different visual angles. If three-dimensional coordinate information including depth in a scene needs to be acquired, one scheme is to adopt binocular stereo imaging, the corresponding calculated amount is large, the real-time performance is often insufficient, and the measured distance is limited, and the other scheme adopts a depth camera, so that the precision, the distance and the real-time performance meet the requirements, but the current depth camera is high in cost and difficult to popularize in a short time;
5) the machine vision positioning technology based on the marker is characterized in that various markers are pre-arranged in the environment, patterns of the markers comprise two-dimension code information, a camera identifies the position information of the markers and the two-dimension codes carried by the markers to calculate space three-dimensional coordinates and posture information, the scheme requires that enough markers are arranged in the environment in advance, the illumination effect of the markers meets the requirements of collection and identification, and the installation of the markers is not only large in workload, but also is limited by the environment to a great extent.
Disclosure of Invention
The object of the present invention is to solve at least one of the technical drawbacks mentioned.
Therefore, the invention aims to provide an Inside-Out space positioning AR stereo display device.
In order to achieve the above object, an embodiment of the present invention provides an Inside-Out spatial positioning AR stereoscopic display device, including: head ring, camera, IMU unit, data collecting and processing module, display screen, semi-transmission/semi-reflection display board, signal and power line, high-speed media data transmission line and microphone, wherein,
the head ring is used for fixing the AR stereoscopic display device on the head of a user;
the camera is arranged at the front end of the head ring and used for acquiring an image of a current scene in real time and sending the image to the data acquisition and processing module;
the IMU unit is arranged at the front end of the head ring and is used for acquiring linear motion acceleration and rotation angular velocity data of a user in real time and sending the linear motion acceleration and rotation angular velocity data to the data acquisition and processing module;
the data acquisition and processing module is used for analyzing the image from the camera in real time, acquiring object characteristic information and background characteristic information in a scene, performing data fusion on the object characteristic information and the background characteristic information in the scene and linear motion acceleration and rotation angular velocity data from the IMU unit, acquiring three-dimensional coordinates and attitude information in the current scene, and sending the fused data to terminal equipment through the signal line so that the terminal equipment generates a virtual object corresponding to the position and the direction according to the three-dimensional coordinates and the attitude information;
the microphone is used for acquiring user voice and an audio signal of the environment in real time;
the display screen is arranged at the front end of the head ring and used for receiving high-speed video and audio signals through the high-speed media data transmission line and displaying virtual objects or required superposed object images generated by the terminal equipment;
the semi-transmissive/semi-reflective display panel is mounted below the display screen and is used for reflecting the virtual object or the required superposed object image to the eyes of a user while transmitting the object in the scene, and superposing the virtual object or the required superposed object image with the image of the actual object in the transmitted scene to generate the effect of Augmented Reality (AR).
Further, still include: and the head ring width adjuster and the head ring auxiliary fixing frame are arranged at the rear end of the head ring.
Further, the camera and IMU unit are calibrated prior to operation, wherein,
calibrating the camera, comprising: acquiring the internal and external parameters and the distortion parameters of the camera, and removing lens distortion when the internal and external parameters and the distortion parameters of the camera are used for subsequent calculation;
calibrating the IMU unit, comprising: and acquiring zero offset parameters of the accelerometer and the gyroscope for eliminating system errors during subsequent data acquisition.
Further, the data acquisition processing module extracts feature points of each frame of image from the camera, performs feature matching on adjacent frames, reserves frames containing a preset number of matched feature points and new feature points as key frames, and utilizes the feature points in the adjacent key frames to calculate three-dimensional coordinates of a scene and the posture of the camera through triangulation to realize motion posture estimation;
and the data acquisition and processing module estimates the attitude and the rapid movement information by using the data acquired by the IMU unit under the condition that the image acquired by the camera is fuzzy or the adjacent inter-frame overlapping areas are too few and cannot be matched, so as to complete position and attitude estimation.
Further, the data acquisition and processing module performs local optimization on the position and attitude estimation result, and the local optimization includes: optimizing the motion attitude estimation result according to the key frame by a linear or nonlinear method to obtain a more accurate result, obtain position attitude data and send the position attitude data to the terminal equipment;
and the data acquisition processing module further carries out next global optimization on the result of the local optimization to establish all feature points of complete motion in the scene.
Further, the data acquisition and processing module corrects the position and the posture by utilizing the condition that a closed circuit is formed by the intersection of the path and the previous track during long-time movement, calls data of the similar position in the previous movement process in global optimization, and eliminates the accumulated error generated by the system through calibration.
Further, the terminal device generates two paths of images, two virtual cameras in different positions and different angles are used according to a virtual object model generated by the terminal device, different images of the same virtual object are generated in real time and projected to eyes of a user through a display screen and a semi-transparent/semi-reflecting plate, the two eyes of the user form a three-dimensional image through retina synthesis, and the three-dimensional image is simultaneously superposed on a real scene and a real object image.
Further, the terminal equipment adopts a Personal Computer (PC) or a mobile terminal.
According to the Inside-Out space positioning AR stereo display device provided by the embodiment of the invention, the visual characteristic information is acquired through the video acquired by the camera, and the real-time space coordinate and the real-time space posture are acquired by adopting a fusion algorithm together with the motion and posture information acquired by the integrated inertial sensor, so that the positioning function is realized; all data acquisition and fusion calculation are completed by the embedded computer, and the positioning function is realized without the help of additional computing equipment or environment setting. According to the invention, two paths of video images with parallax can be output according to application, so that a three-dimensional image of a virtual object is provided and is superposed with an image of a real scene and an object, and the augmented reality effect is more vivid.
The Inside-Out space positioning AR stereo display device provided by the embodiment of the invention has the following advantages:
1) the real-time positioning function can be realized by a single device without the help of additional equipment or setting of a marker in the environment, so that the device has higher flexibility and stronger adaptability;
2) the monocular camera and the integrated IMU sensor are adopted, so that the cost is low;
3) the multi-sensor fusion can accurately track and position the high-speed and low-speed movement without being influenced by the free walking and head rotation of a user, so that a more accurate superposition effect of a virtual object and a real space is realized;
4) the real-time video can be flexibly superposed aiming at different AR applications, and the stereoscopic display function is realized;
5) adopt the design of wear-type, conveniently wear.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The invention provides an Inside-Out space positioning AR stereo display device, which is based on a multi-sensor fusion space positioning technology, acquires three-dimensional position and posture information of a user through real-time calculation, provides target position and orientation information for applied virtual visual information, and realizes accurate superposition of a real object and a virtual object in a scene. The device utilizes the binocular parallax principle of human eyes, and provides two paths of different real-time videos, so that the virtual images seen by a user have a stereoscopic effect.
Several related art terms mentioned hereinafter are described below:
virtual Reality (VR), a computer simulation system that can create and experience Virtual world, which uses computer to generate a simulation environment, is a system simulation of multi-information fusion interactive three-dimensional dynamic visual and entity behaviors, and enables users to immerse in the environment.
AR: augmented Reality (AR) is a technology for enriching the real world by fusing virtual information (objects, pictures, videos, sounds, etc.) into the real environment in real time through a computer, which integrates the virtual and real world and can realize the interaction with the virtual world.
Inside-out: a positioning and tracking mode in virtual reality and augmented reality is characterized in that a sensor such as a camera is fixed on a certain part (usually on the head) of a user body, the sensor acquires position and posture change information from an external environment when in use, motion information relative to the environment is obtained in real time through calculation, and the corresponding position and posture in the virtual environment are determined. Compared with an Outside-In mode of fixing a plurality of sensors In the environment and acquiring the position and posture information of a user through an external sensor, the method has the advantages of lower requirements on the environment and equipment installation cost, and more flexible and convenient application area expansion.
An IMU: an Inertial Measurement Unit (IMU) is a sensor assembly for measuring the motion and attitude of an object. Three-axis accelerometers are typically included to measure the acceleration of an object in three axial directions in space, and three-axis gyroscopes to measure the angular velocity of an object rotating in three dimensions.
As shown in fig. 1, an Inside-Out spatial positioning AR stereoscopic display device according to an embodiment of the present invention includes: the device comprises a head ring 1, a camera 2, an IMU unit, a data acquisition and processing module, a display screen 4, a semi-transmission/semi-reflection display panel 5, a signal and power line 6, a high-speed media data transmission line 7 and a microphone 10. Wherein, the IMU unit and the data acquisition processing module are marked as a mark 3 together.
Specifically, the head band 1 is used to fix the AR stereoscopic display device to the head of the user.
In addition, the Inside-Out space positioning AR stereo display device of the invention also comprises: a head ring width adjuster 8 and a head ring auxiliary fixing frame 9 which are arranged at the rear end of the head ring 1.
Wherein, the head circle width adjuster 8 can adjust the size of the head circle 1 to adapt to the size of the head of a user, thereby achieving the best wearing effect. The head ring auxiliary fixing frame 9 can be matched with the head ring 1 to achieve a better fixing effect.
The camera 2 is arranged at the front end of the head ring 1 and used for acquiring images of a current scene in real time and sending the images to the data acquisition and processing module.
The IMU unit is arranged at the front end of the head ring 1 and used for acquiring linear motion acceleration and rotation angular velocity data of a user in real time and sending the linear motion acceleration and rotation angular velocity data to the data acquisition and processing module.
In one embodiment of the present invention, as shown in fig. 2, when the system is running, the data acquisition and processing module needs to calibrate the camera 2 and the IMU unit before they can work.
Specifically, the calibration of the camera includes the following steps: and acquiring internal and external parameters and distortion parameters of the camera, and removing lens distortion when the internal and external parameters and the distortion parameters of the camera are used for subsequent calculation.
Calibrating the IMU unit, comprising the steps of: and acquiring zero offset parameters of the accelerometer and the gyroscope for eliminating system errors during subsequent data acquisition.
After the calibration of the camera 2 and the IMU unit is completed through the data acquisition processing module, the camera and the IMU are controlled to respectively perform data acquisition, the characteristic values of real-time videos acquired by the camera are extracted through the data acquisition processing module, and the IMU unit acquires acceleration and angular velocity values in real time, namely, the acceleration and angular velocity values respectively complete corresponding work.
In one embodiment of the present invention, the sensor combination may be modified, for example, the camera 2 is not limited to monocular, and multiple cameras may be used; the IMU unit may not be limited to accelerometers and gyroscopes, but magnetometers or barometers, etc. may be added depending on the application. This can be selected according to the actual needs of the user, and will not be described herein.
The data acquisition and processing module is used for analyzing the image from the camera 2, acquiring object characteristic information and background characteristic information in the scene, performing data fusion on the object characteristic information and the background characteristic information in the scene and linear motion acceleration and rotation angular velocity data from the IMU unit, acquiring three-dimensional coordinates and attitude information in the current scene, and sending the fused data to the terminal equipment through a signal and a power line 6, so that the terminal equipment generates a virtual object in a specific position and direction according to the three-dimensional coordinates and the attitude information.
In one embodiment of the present invention, the data acquisition and processing module may employ an embedded processing computer.
It should be noted that the signal and power line 6 in the present invention is divided into two parts, namely a signal line and a power line, wherein the signal line provides signal transmission between the embedded computer and the back-end PC or the mobile computer, and includes various control signals, and three-dimensional coordinates and attitude data generated by the embedded computer; the power line supplies power for the embedded computer, the display screen 4, the camera 2, the IMU and the like.
Specifically, the data acquisition and processing module extracts feature points of each frame of image from the camera, performs feature matching on adjacent frames, and retains frames with a preset number of matched feature points and new feature points as key frames. The feature matching is used for solving the data association problem, namely the corresponding relation between the current scene and the previous scene, and ensures the calculation of three-dimensional coordinates and the tracking of a motion state. And then, by utilizing the characteristic points in a plurality of adjacent key frames and through triangulation, calculating the three-dimensional coordinates of the scene and the posture of the camera 2, and realizing the estimation of the motion posture.
And the data acquisition and processing module is used for estimating the attitude and the rapid movement information by using the data acquired by the IMU unit to finish position and attitude estimation for the condition that the image acquired by the camera 2 is fuzzy (for example, the image acquired under the high-speed movement state of the camera 2 is fuzzy) or the condition that the adjacent frames cannot be matched due to too few overlapped areas.
Fig. 4 is a flow chart of a positioning procedure in the data processing module.
As shown in fig. 4, the data acquisition and processing module performs local optimization on the position and posture estimation result, including: and optimizing the motion attitude estimation result according to the latest key frames by a linear or nonlinear method to obtain a more accurate result, and sending the obtained position attitude data to the terminal equipment to determine the position and the direction of the virtual object.
The data acquisition and processing module further carries out next global optimization on the result of the local optimization, establishes all feature points of complete motion in a scene, only keeps a certain number of continuous key frames to meet the requirement of real-time processing due to the limitation of the computing capability and storage capacity of an embedded system, and in the subsequent motion, the earliest key frame is gradually removed along with the input of a new key frame, so that only a plurality of currently stored feature points are used, errors in measurement and calculation are accumulated, the calculated coordinate and posture deviation is gradually increased in long-time motion, the track of repeated motion cannot meet the consistency due to the increase of the deviation and is not overlapped, and the deficiency of local optimization can be avoided for a larger scene after the map is established through the global optimization.
And the data acquisition processing module executes loop detection, and the loop detection comprises the steps of correcting the position and the posture by utilizing the condition that a path is intersected with the previous path to form a closed circuit during long-time movement, calling data of a similar position in the previous movement process in global optimization, and eliminating accumulated errors generated by the system through calibration.
For application in a large scene, in order to take accuracy and real-time into account, the generated global graph can also be transmitted to a terminal device at the rear end to be used as a part of the panoramic graph, and in the motion process, the relevant part of the global graph can be called according to the current area and downloaded into an embedded computer, so that the operation amount and the storage space of the embedded computer are saved.
In addition, the invention also provides a compensation calibration function, according to the respective characteristics of the camera and the IMU unit, when the data estimation result acquired by one sensor is inaccurate or the data can not be acquired, the compensation is carried out according to the result acquired by the other sensor: the camera acquisition result appears fuzzy when the head-mounted device moves fast, or the camera acquisition result appears fuzzy when a fast moving object appears in the scene, the camera acquisition result can obtain better attitude estimation according to the attitude information acquired by the IMU unit at a high speed, and the camera acquisition result can judge the high-speed change of the scene according to the attitude information of the IM U unit, wherein the high-speed change is originated from the motion of the camera or the object moving at a high speed exists in the scene, so that the camera acquisition result and the object moving at a high speed are respectively processed. . The IMU unit has drift when the IMU unit is static or moves at a slow speed, the accumulated result can cause the increase of errors, and the pose information of the camera is reliable when the camera is static or moves at a slow speed, and meanwhile, the IMU unit can also be used for calibrating the pose information of the IMU.
The microphone 10 is used for collecting the voice of the user and the audio signal of the environment in real time. The display screen 4 is arranged at the front end of the head ring 1 and used for receiving high-speed video and audio signals generated by the terminal equipment in real time through the high-speed media data transmission line 7 and displaying the signals in real time, wherein the content of the signals is an image of a virtual object or an object required to be superposed, and an output image of the display screen is superposed with an actual object in a scene through the semi-transmission/semi-reflection plate 5 to produce an augmented reality effect.
The semi-transmissive/semi-reflective plate 5 is installed below the display screen 4, and is used for reflecting the virtual object or the required superimposed object image generated by the display screen 4 to the eyes of the user while transmitting the background and the object in the scene, and superimposing the virtual object or the required superimposed object image with the image of the actual object in the transmitted scene, so as to generate the effect of augmented reality AR.
That is, the image generated by the data acquisition and processing module is transmitted to the display screen 4 through the high-speed media transmission line, and is optically superposed with the object and the environment in the real scene through the semi-transmission/semi-reflection plate 5, so as to form a real-time superposed image seen by the user, namely, an augmented reality image.
It should be noted that, the software module for data acquisition and processing in the present invention may be changed or updated according to the application, for example, for some applications, the entity and the corresponding model may be precisely superimposed, and a corresponding matching function may be added, so as to achieve a display effect of high precision fusion of the two.
The terminal equipment generates a virtual object with corresponding position and posture or generates an object image to be displayed according to the application according to the received real-time coordinate and posture information from the data acquisition and processing module
The terminal equipment generates two paths of images, two virtual cameras in different positions and different angles are used according to a virtual object generated by the data acquisition and processing module, different images of the same virtual object are generated in real time and projected to the eyes of a user through the display screen 4 and the semi-transmission/semi-reflection plate 5, the two eyes of the user form a three-dimensional image through retina synthesis, and meanwhile, the real world scene and the real object are also directly projected on the retina of the user through the semi-transmission/semi-reflection plate 5, so that the three-dimensional image is superposed on the real scene and the real object image.
In one embodiment of the invention, the terminal equipment adopts a personal computer PC or a mobile terminal.
Fig. 3 is a schematic diagram of stereoscopic display using parallax according to an embodiment of the present invention.
The principle of stereo imaging is based on the principle of stereo imaging of human eyes: the interpupillary distance between the left and right eyes is about 65mm (adult), the images of the same object viewed by both eyes on the retina have slight differences, and the images synthesized by the optic nervous system based on these differences not only contain information such as color and brightness, but also form distance information of each point in the viewed object, i.e. form a three-dimensional image by integration. If the videos of the same scene are shot at different positions and are respectively and simultaneously input into the left eye and the right eye, a stereoscopic image can be presented in front of the eyes, which is the imaging principle of a stereoscopic film and a stereoscopic television.
As shown in fig. 3, an image 35 formed by the object 30 through two eyes in fig. 3 includes not only color and brightness information of the object 30, but also depth information, that is, a distance from the object 30 to the two eyes. The left eye 33 images the real object 30 as 31, the right eye 34 images the real object 30 as 32, and since there is a certain distance between pupils of the left eye 33 and the right eye 34, namely, the interpupillary distance (the average interpupillary distance of an adult is 65mm), the brain uses the slight difference between the images 31 and 32 formed by the retina, and the finally synthesized image 35 contains the three-dimensional information of the real object 30.
According to the Inside-Out space positioning AR stereo display device provided by the embodiment of the invention, the visual characteristic information is acquired through the video acquired by the camera, and the real-time space coordinate and the real-time space posture are acquired by adopting a fusion algorithm together with the motion and posture information acquired by the integrated inertial sensor, so that the positioning function is realized; all data acquisition and fusion calculation are completed by the embedded computer, and the positioning function is realized without the help of additional computing equipment or environment setting. According to the invention, two paths of video images with parallax can be output according to application, so that a three-dimensional image of a virtual object is provided and is superposed with a real scene and an object, and the augmented reality effect is more vivid.
The Inside-Out space positioning AR stereo display device provided by the embodiment of the invention has the following advantages:
1) the real-time positioning function can be realized by a single device without the help of additional equipment or setting of a marker in the environment, so that the device has higher flexibility and stronger adaptability;
2) the monocular camera and the integrated IMU sensor are adopted, so that the cost is low;
3) the multi-sensor fusion can accurately track and position the high-speed and low-speed movement without being influenced by the free walking and head rotation of a user, so that a more accurate superposition effect of a virtual object and a real space is realized;
4) the real-time video can be flexibly superposed aiming at different AR applications, and the stereoscopic display function is realized;
5) adopt the design of wear-type, conveniently wear.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.