CN117218320B

CN117218320B - Space labeling method based on mixed reality

Info

Publication number: CN117218320B
Application number: CN202311473370.2A
Authority: CN
Inventors: 张永峰; 李名森; 丁冬睿; 陈月辉; 吕宪龙; 陈星�; 林泉; 韩子涵; 刘志豪; 徐航
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2024-02-27
Anticipated expiration: 2043-11-08
Also published as: CN117218320A

Abstract

The invention discloses a space labeling method based on mixed reality, which belongs to the technical field of mixed reality and image processing, and comprises the following steps: starting a remote labeling function, and initializing a mixed reality head display positionable camera; correcting the distortion of the video frames, and calculating a projection matrix and a view matrix of each video frame to form annotation data; the user equipment receives the label, labels the frozen picture and solves the three-dimensional coordinate by using a space labeling algorithm; the mixed reality head display receives the labeling coordinates, corrects the labeling coordinates and obtains the final position of labeling information; and the mixed reality head display finishes positioning rendering of the virtual information according to the final position of the labeling information. The method and the device improve the accuracy and precision of the labeling in the mixed reality space. The wearer of the mixed reality head display sees the marking graph at the marking position, and the marking result is shared to the background marking staff in real time, so that the effect that the screen picture is shared by the rear-end marking staff and the front-end wearer is realized.

Description

Space labeling method based on mixed reality

Technical Field

The invention relates to a space labeling method based on mixed reality, and belongs to the technical field of mixed reality and image processing.

Background

Traditional video chat communication can only provide pictures captured by a camera, and the communication mode is limited by video pictures on a screen, so that physical objects cannot be shared in time, virtual information cannot be shared, and communication and experience effects are relatively limited. With the rapid development of augmented reality and mixed reality technologies in recent years, the combination of physical world and virtual information becomes possible, and particularly in the mixed reality technology, the seamless fusion of the physical world and the virtual world is realized by the methods of sensing the head movement of a user, sensing the surrounding space of the user and the like, and the communication mode of the fusion of the physical world and the virtual information provides more natural, interactive and immersive experience for the user. People are not limited to two-dimensional information on a screen. The possibility of communication in the real space is expanded, and the information communication efficiency is improved.

When a user needs remote help, the mixed reality equipment is worn, mixed reality remote cooperation technical software is used, a remote or wearer sends a remote cooperation request to conduct remote communication and guidance, and a three-dimensional space labeling algorithm is utilized to conduct positioning rendering on virtual information. Physical objects on the side of the user are effectively shared, communication efficiency is improved, human resources are saved, and sudden problems are responded in time.

In the current remote labeling mode, the labeling picture or the virtual information space position is determined based on the internal parameters of the mixed reality camera at the labeling moment. The problems that the marked picture can shield the visual field and is inaccurate, the viewing angle is limited, and the object corresponding to the real world is ghost, and the ideal marked position and the actual marked position are deviated when the virtual information is marked. The labeling effect affects the field of view of the user, reduces the communication efficiency, and can cause unavoidable loss due to wrong operation caused by labeling errors.

Disclosure of Invention

In order to solve the problems, the invention provides a space labeling method based on mixed reality, which can improve the labeling accuracy and precision in the mixed reality space.

The technical scheme adopted for solving the technical problems is as follows:

the embodiment of the invention provides a space labeling method based on mixed reality, which comprises the following steps:

starting a remote labeling function, and initializing a mixed reality head display positionable camera;

correcting the distortion of the video frames, and calculating a projection matrix and a view matrix of each video frame to form annotation data;

the user equipment receives the label, labels the frozen picture and solves the three-dimensional coordinate by using a space labeling algorithm;

the mixed reality head display receives the labeling coordinates, corrects the labeling coordinates and obtains the final position of labeling information;

and the mixed reality head display finishes positioning rendering of the virtual information according to the final position of the labeling information.

As a possible implementation manner of this embodiment, the starting the remote labeling function, initializing the positionable camera of the mixed reality head display, includes:

starting a remote labeling function of a mixed reality head display, and checking a function of a first visual angle of the mixed reality head display;

initializing a mixed reality head display positionable camera, wherein the front end of the mixed reality head display positionable camera is provided with a camera facing to the external environment, and the mixed reality head display is provided with a remote labeling application program which can access and control the camera;

and checking whether the mixed reality equipment is connected with the upper background server, if so, positioning the camera by the mixed reality equipment, initializing successfully, starting the current mixed reality head display camera, and acquiring the current video frame.

As a possible implementation manner of this embodiment, the video frame distortion correction, calculating a projection matrix and a view matrix of each video frame, to form labeling data, includes:

and (3) returning camera internal reference information of each video frame according to the locatable camera of the mixed reality head display, and correcting distortion in the image by using a distortion correction algorithm.

As a possible implementation manner of this embodiment, the correcting distortion in the image using the distortion correction algorithm includes:

establishing a mathematical model of the mixed reality head display locatable camera:

（1）

where (u, v) is a two-dimensional coordinate on the image plane, (X, Y, Z) is a three-dimensional coordinate in the camera coordinate system, (f) _x , f _y ) Is the focal length of the camera, (c) _x , c _y ) Is the camera optical center coordinates;

introducing distortion into a mathematical model to obtain a distortion correction model:

（2）

（3）

wherein, the method comprises the following steps of,/>) Is the two-dimensional coordinates, k, of the image after distortion correction ₁ ,k ₂ ,k ₃ Is the radial distortion coefficient of the camera, p ₁ ,p ₂ Is the tangential distortion of the cameraVariable distortion coefficient->；

Inputting the video frame into a distortion correction model to obtain a video frame image after distortion correction, and performing marking;

calculating a view matrix V of the mixed reality head-display positionable camera under a left-hand coordinate system in a labeling procedure:

（4）

wherein T is ₁ Camera coordinate system view matrix for mixed reality head display locatable camera, T ₂ A transformation matrix for the mixed reality head-mounted positionable camera from a camera coordinate system to a world coordinate system;

calculating a projection matrix P of the mixed reality head display positionable camera under a left-hand coordinate system in a labeling procedure:

（5）

wherein T is ₃ The projection matrix of the camera can be positioned for the mixed reality head display.

As a possible implementation manner of this embodiment, the labeling data includes: the video resolution of the current frame, the video frame rate, the projection and view matrices of the camera, and camera references including optical center, focal length, radial distortion, and tangential distortion.

As a possible implementation manner of this embodiment, the marking the frozen screen includes:

freezing a current video frame to draw a rectangular shape, a circular shape or a free line preset by the system;

labeling rectangular, circular or free line pictures to obtain coordinates of video frame pictures in a screen coordinate system;

according to the projection matrix and the view matrix of the current virtual camera, the position coordinate p of the current user interface label rendering is obtained:

（10）

wherein V (view matrix) and P (projectionmatrix) are the projection matrix and view of the current virtual camera, x, y are coordinate points to be calculated on the screen, and width and height are the width and length of the resolution of the current camera, respectively;

and (3) solving the position of the current user labeling content in the user interface through the formula conversion, fixing the position to a position with a depth of 1m from the z axis of the camera, and rendering and displaying the current user interface labeling content.

As a possible implementation manner of this embodiment, the labeling the rectangular, circular or free line picture includes:

labeling a rectangular picture:

with the current left mouse click at the point of depression (x ₁ , y ₁ ) As the vertex of the upper left corner of the rectangle, the left mouse button is used to lift the point (x ₄ , y ₄ ) Is the vertex of the right lower corner of the rectangle;

under a screen coordinate system, the four corner points of the current marked rectangle are obtained according to the geometric relationship of the rectangle:

（6）

wherein, (x) ₁ , y ₁ )、(x ₂ , y ₂ )、(x ₃ , y ₄ )、(x ₄ , y ₄ ) Four vertexes of the rectangle, namely the upper left, the upper right, the lower left and the lower right;

labeling a circular picture:

with the current left mouse click at the point of depression (x ₁ , y ₁ ) As a starting point, a left mouse button lifting point (x ₂ , y ₂ ) Is the end point;

calculate the circle center o (x) _o ,y _o ) And radius r:

（7）

（8）

calculate the point p (x, y) on the circle:

（9）

where d=0, 1, …,360 represents a point on each degree of sample circle;

labeling the free line picture:

when the free line images are marked, only the two-dimensional coordinate points on all screens pressed by the left button of the current mouse are saved.

As a possible implementation manner of this embodiment, the solving the three-dimensional coordinate using the spatial labeling algorithm includes:

(1) Solving the position of the coordinate of the marked picture under the screen coordinate system according to the position of the coordinate of the marked picture under the pixel coordinate (x _p , y _p ) Is defined by the position of:

（11）

wherein f _w And f _h The method is characterized in that the method comprises the steps of (1) the width and the height of the resolution of a current frozen frame in a screen coordinate system, wherein x and y are marking point coordinates in the screen coordinate system, and w and h are the width and the height of the resolution of the frozen frame in a corresponding pixel coordinate system;

(2) And (3) performing horizontal and vertical overturning treatment on the image:

（12）

wherein S is a scaled three-dimensional vector of the rendering component, S ^’ Lambda is the flipped rendering component ₁ And lambda (lambda) ₂ Representing the scaling factors horizontally and vertically, respectivelySon, lambda ₁ And lambda (lambda) ₂ When negative, the inversion of the coordinate system occurs in the horizontal direction and the vertical direction respectively;

(3) Mirror image processing is carried out on the projection matrix:

（13）

wherein P is a projection matrix to be processed, P ^’ Is a projection matrix after mirror image, theta _1, θ ₂ 1 or-1, representing turning over in the horizontal or vertical direction;

(4) Converting the marked pixel coordinate system into a clipping space;

converting the pixel coordinates to a coordinate system in clipping space:

（14）

then (x ', y') is the coordinate of the coordinate system under the obtained clipping space, and (x, y) is the width w and length of the resolution of the w and h frozen frames;

(5) Extracting matrix coefficients and carrying out coordinate normalization processing;

the projection matrix P is expressed as:

（15）

wherein n is the distance from the near clipping plane to the camera position; f is the distance from the far clipping plane to the camera position; i is the cutting plane position near the left side of the cutting plane; r is the cutting plane position near the right side of the cutting plane; t is the clipping plane position near the top of the clipping plane; b is the clipping plane position near the clipping plane bottom;

for focal length f in projection matrix _x And f _y Image center point c _x And c _y Extracting normalized center point c ^’ _x And c ^’ _y ：

（16）

（17）

（18）

Wherein alpha is a normalization factor,from P in projection matrix P ₃₃ Determining;

(6) Calculating a direction vector of the labeling position and converting the direction vector into world coordinates;

for coordinates (x, y) of the annotation point in clipping space, a direction vector d (x _d , y _d , z _d )：

（19）

Wherein f _x And f _y For the focal length, alpha is the normalization factor,from P in projection matrix P ₃₃ Determining;

(7) Converting the labeling coordinates in the camera space coordinate system into coordinates corresponding to the world space coordinate system;

for the corresponding direction vector d (x _d , y _d , z _d ) And the view matrix V (ViewMatrix) of the current corresponding frozen frame, calculating the world coordinates p (x, y, z) of the labeling points:

（20）

(8) And returning the world coordinates of all the current annotation points to the mixed reality head display, and performing rendering processing of the annotation points.

As a possible implementation manner of this embodiment, the mixed reality head display receives the labeling coordinates, corrects the labeling coordinates, and obtains the final position of the labeling information;

obtaining three-dimensional coordinates of the marked points, and scanning the space geometric grid;

determining the depth information of the marked point through ray detection: for labeling a rectangular frame, adopting a center point of the rectangular frame as a ray detection center, and acquiring collision points of rays and an object to be labeled; for circle labeling, adopting a circle center as a ray detection center, and acquiring collision points of rays and an object to be labeled; for straight line marking, detecting rays of each marking point on the straight line, and acquiring collision points of the rays and an object to be marked;

after the information of the collision points is obtained, the reasonable marking points are found through correcting the marking positions, and rendering and displaying are carried out.

As a possible implementation manner of this embodiment, the finding a reasonable labeling point through correcting the labeling position includes:

acquisition of collision Point P _h Then, different correction strategies are used according to the labeling graph;

(1) For rectangle labeling, the rectangle position is corrected by adopting the following method:

calculate the collision point P _h (x _h , y _h , z _h ) To the modified labeling rectangle a certain vertex segment length L:

（21）

（22）

wherein P is _c (x _c , y _c , z _c ) Is a rectangular center point, P _f (x _f , y _f , z _f ) To mix the position of the real helmet in the world coordinate system at the moment of freezing the frame, P _a (x _a , y _a , z _a ) Some rectangular vertex needing to correct coordinates, P ₁ (x ₁ , y ₁ , z ₁ ) And P ₂ (x ₂ , y ₂ , z ₂ ) For two diagonal vertices of the rectangle to be modified, P _c (x _c , y _c , z _c ) Is a rectangular center point to be corrected;

correcting the depth information of the marked point of the rectangle, and regarding the center point P of the rectangle _c (x _c , y _c , z _c ) Point of impact P _h (x _h , y _h , z _h ) And the rectangular vertex coordinates P to be corrected _a (x _a , y _a , z _a ) Calculating a corrected rectangular marking point P (x, y, z):

（23）

for the collision point P _h (x _h , y _h , z _h ) And the coordinates P (x, y, z) of the vertices of the rectangle of the corrected position, calculate the coordinates P of the vertices of a certain rectangle frame of the final corrected size and position with the length L of the line segment _r (x _r , y _r , z _r )：

（24）

Then P _r (x _r , y _r , z _r ) The final coordinate point of the rectangular frame to be solved;

(2) For the circular label, the correction label position is calculated by adopting the following method:

from P _c (x _c , y _c , z _c ) Determining the three-dimensional coordinate of the circle center to be corrected, and determining the radius of the circle by the radius r;

calculate the collision point P _h (x _h , y _h , z _h ) To the length L of the modified circle, where P _f (x _f , y _f , z _f ) To mix the position of the real helmet in the world coordinate system at the moment of freezing the frame, P _a (x _a , y _a , z _a ) A certain point on a circle of coordinates needs to be corrected;

the obtained length L is the circle center of the corrected circle mark, and the three-dimensional coordinates P (x, y, z) of one point on the corrected circle are as follows:

（25）

(3) And for the marking of an irregular free curve, sampling each point on the line segment, taking a frozen point as a starting point, transmitting rays to the frozen point, and enabling the collision point to be the point to be rendered.

The technical scheme of the embodiment of the invention has the following beneficial effects:

the spatial labeling method based on mixed reality in the technical scheme of the embodiment of the invention comprises the following steps: starting a remote labeling function, and initializing a mixed reality head display positionable camera; correcting the distortion of the video frames, and calculating a projection matrix and a view matrix of each video frame to form annotation data; the user equipment receives the label, labels the frozen picture and solves the three-dimensional coordinate by using a space labeling algorithm; the mixed reality head display receives the labeling coordinates, corrects the labeling coordinates and obtains the final position of labeling information; and the mixed reality head display (namely the mixed reality head-mounted display device) finishes positioning and rendering of the virtual information according to the final position of the labeling information. The method and the device improve the accuracy and precision of the labeling in the mixed reality space. After rendering is completed, a wearer of the mixed reality head display sees the marking graph at the marking position, the function of virtual and real picture transmission is realized through the positionable camera of the mixed reality head display, marking results are shared to background marking staff in real time, and the effect that the back-end marking staff and the front-end wearer share screen pictures is realized.

Drawings

FIG. 1 is a flow chart illustrating a mixed reality based spatial annotation method according to an example embodiment;

FIG. 2 is a schematic diagram illustrating barrel distortion according to an exemplary embodiment;

FIG. 3 is a schematic diagram illustrating a pincushion distortion, according to an exemplary embodiment;

FIG. 4 is a diagram of a camera mathematical model of a mixed reality head display, shown according to an exemplary embodiment;

FIG. 5 is a schematic diagram of a solution process for a view matrix of a mixed reality head-mounted positionable camera, according to an example embodiment;

FIG. 6 is a schematic diagram illustrating an environment in which the world can be perceived for display using a spatially aware mixed reality head display, according to an example embodiment.

Detailed Description

The invention is further illustrated by the following examples in conjunction with the accompanying drawings:

in order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below with reference to the following detailed description and the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different structures of the invention. In order to simplify the present disclosure, components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted so as to not unnecessarily obscure the present invention.

As shown in fig. 1, the embodiment of the invention provides a spatial labeling method based on mixed reality, which comprises the following steps:

1. and starting a remote labeling function, and initializing the mixed reality head display positionable camera.

When the mixed reality head display wearer or other equipment users need to perform the space labeling function, the function of checking the first visual angle can be started to perform the space labeling.

Firstly, a mixed reality head display positionable camera is initialized, and the front end of the mixed reality head display positionable camera is provided with a camera facing to the external environment, so that a remote marking application program can see the content seen by a wearer. The remote labeling application program can access and control the camera, and can adapt to the application program when the parameter configuration of the camera is modified to achieve the optimal user experience effect.

Considering the scene used by the user and the performance and endurance of the mixed reality head display, when the program is started, the program automatically searches the configuration file available for the current camera, and as shown in table 1, the configuration file of the camera contains information such as camera resolution, frame rate, horizontal view and the like. The configuration files are used for initializing the current camera, different configuration files are selected according to the current specific use scene, the adaptation capability of the mixed reality head displayed in different environments can be improved, and the requirements of the mixed reality device on performance, endurance and the direct quality of the algorithm display picture are balanced to the greatest extent.

Table 1: configuration file capable of being used by camera

Configuration file	Resolution ratio	Frame rate	Horizontal field of view (H-FOV)	Usage scenarios
					Legacy, 0 BalancedVideoAndPhoto,100	2272x1278	15.30	64.69	High quality video recording
Legacy, 0 BalancedVideoAndPhoto,100	896x504	15.30	64.69	High quality photo capture preview stream
					BalancedVideoAndPhoto, 120	1952x1100	15.30	64.69	Long duration scene
BalancedVideoAndPhoto, 120	1504x846	15.30	64.69	Long duration scene
					VideoConferencing, 100	1952x1100	15, 30,60	64.69	Video conferencing, long duration scenes
Videoconferencing, 100	1504x846	5, 15, 30,60	64.69	Video conferencing, long duration scenes
					Videoconferencing,100 BalancedVideoAndPhoto, 120	1920x1080	15, 30	64.69	Video conferencing, long duration scenes
Videoconferencing,100 BalancedVideoAndPhoto, 120	1280x720	15, 30	64.69	Video conferencing, long duration scenes
					Videoconferencing,100 BalancedVideoAndPhoto,120	1128x636	15, 30	64.69	Video conferencing, long duration scenes
Videoconferencing,100 BalancedVideoAndPhoto, 120	960x540	15,30	64.69	Video conferencing, long duration scenes
					Videoconferencing,100 BalancedVideoAndPhoto, 120	760x428	15, 30	64.69	Video conferencing, long duration scenes
Videoconferencing,100 BalancedVideoAndPhoto, 120	640x360	15, 30	64.69	Video conferencing, long duration scenes
					Videoconferencing,100 BalancedVideoAndPhoto, 120	500x282	15, 30	64.69	Video conferencing, long duration scenes
Videoconferencing,100 BalancedVideoAndPhoto, 120	424x240	15, 30	64.69	Video conferencing, long duration scenes

Before use, the configuration file of the camera is automatically selected according to the scene required by the user, and the camera can be manually switched when the user needs a use scene with higher resolution. When the camera is used by the subsequent algorithm, the captured video frame comprises information such as the position of the camera and internal parameters of the lens, and the information is used as algorithm input and is further processed by using a labeling algorithm.

Then, it is checked whether the mixed reality device is connected to the background server. If the background server is connected, the mixed reality equipment can position the camera and initialize successfully, and can start the current mixed reality head display camera to acquire the current video frame for marking data transmission. Wherein, the annotation data comprises: video resolution of the current frame, video frame rate, projection matrix and view matrix of the camera, camera intrinsic parameters (optical center, focal length, radial distortion and tangential distortion); if the camera is not connected with the server, waiting for the program to reconnect with the server, and successfully connecting the server and the mixed reality equipment can locate the camera and perform the labeling function after the camera is initialized successfully.

2. And correcting the distortion of the video frames, and calculating a projection matrix and a view matrix of each video frame to form annotation data.

In order to accurately superimpose virtual annotation content on the real world, the accuracy of an annotation algorithm is ensured, and distortion correction is required for pictures of video frames. Distortion correction is a technique for repairing camera picture introduction. Lens distortion may cause image distortion, which in mixed reality applications may affect the alignment and matching of virtual content to the real world. Generally, distortion of cameras is classified into radial distortion and tangential distortion. Wherein the radial distortion is divided into barrel distortion and pincushion distortion as shown in fig. 2 and 3.

These distortion types can affect the accuracy and precision of the annotation, causing annotation bias. The locatable camera of the mixed reality head display may return camera information such as camera content for each video frame and then use a distortion correction algorithm to correct for distortion in the image.

As shown in fig. 4, there is the following mathematical model for a mixed reality head-mounted camera:

（1）

where (u, v) is a two-dimensional coordinate on the image plane, (X, Y, Z) is a three-dimensional coordinate in the camera coordinate system, (f) _x , f _y ) Is the focal length of the camera, (c) _x , c _y ) Is the camera optical center coordinates.

In a real situation camera imaging will be distorted, introducing distortion into the camera model, which is formulated as follows:

（2）

the two-dimensional coordinates of the image after distortion correction are calculated as follows:

（3）

wherein%,/>) Is the two-dimensional coordinates, k, of the image after distortion correction ₁ ,k ₂ ,k ₃ Is the radial distortion coefficient of the camera, p ₁ ,p ₂ Is the tangential distortion coefficient of the camera, and +.>。

And (3) obtaining the distortion correction model of the mixed reality head display through the formula. And carrying each obtained video frame into the model to obtain an image after distortion correction. And then the label is transmitted to other equipment terminals by the server for labeling.

The projection matrix and view matrix for each video frame are then calculated.

As shown in fig. 5, there is a calculation method of a view matrix V (ViewMatrix) for each video frame as follows:

view matrix T of camera coordinate system acquired from mixed reality head display ₁ And a transformation matrix T of the camera coordinate system into the world coordinate system ₂ The following calculation formula is provided:

（4）

the view matrix of the mixed reality head-display positionable camera under the left-hand coordinate system in the labeling program is solved by acquiring and operating the two transformation matrices and finally converting the transformation matrices into the left-hand coordinate system.

The projection matrix P (ProjectionMatrix) for each video frame is calculated as follows:

projection matrix T obtained from mixed reality head-display positionable camera ₃ There isThe following calculation formula.

（5）/>

By T of ₃ And (3) performing matrix transposition to obtain a projection matrix of the mixed reality head-display positionable camera under the left-hand coordinate system in the labeling procedure.

3. The user equipment receives the label, labels the frozen picture and solves the three-dimensional coordinate by using a space labeling algorithm;

after the server forwards the video frame after distortion correction to other equipment terminals, operators of other terminals determine whether the current picture is a picture to be marked, if so, the current video frame is frozen for drawing marking, if not, the user waits for the picture to be marked, and then the next marking function is performed.

After freezing the video frame, the user can draw the preset shape (rectangle, circle) or free line of the system and label the current picture. And the marking data of the current frozen frame can be automatically saved and used for solving in the next marking algorithm.

For rectangular labels, the current mouse left click is used to click on the point (x ₁ , y ₁ ) As the vertex of the upper left corner of the rectangle, the left mouse button is used to lift the point (x ₄ , y ₄ ) Is the right lower corner vertex of the rectangle. Under the screen coordinate system, the four corner points of the current marked rectangle can be obtained according to the geometric relationship of the rectangle, and the formula is as follows:

（6）

for circular labels, the point of depression (x ₁ , y ₁ ) As a starting point, a left mouse button lifting point (x ₂ , y ₂ ) Is the end point. For the circle center o (x _o ,y _o ) And radius r has the following formula:

（7）

（8）

and for a point p (x) _p , y _p ) The formula is as follows:

（9）

where d=0, 1, …,360 represents a point on each degree of sample circle.

Similarly, for the free line, only the two-dimensional coordinate points on all the screens pressed by the left mouse button at present are saved.

After the coordinates of the screen coordinate points to be marked are obtained, the position coordinates p of the current user interface marking rendering can be obtained according to the projection matrix and the view matrix of the current virtual camera. The formula is as follows, where V (viewmatrix) and P (projectionmatrix) are the projection matrix and view, respectively, of the current virtual camera, x, y are coordinate points to be calculated on the screen, width and height are the width and length of the resolution of the current camera

（10）

By the above formula transformation, it is not difficult to find the position of the current user annotation in the user interface and fix it to a depth of 1m from its camera z-axis. Rendering the content of the current user interface label.

The specific process of solving the three-dimensional coordinates by using the spatial annotation algorithm is as follows:

(1) And obtaining the position of the marked information screen under the coordinate system, and solving the position of the frozen frame corresponding to the pixel coordinate.

According to the function interface of the existing development tool, the position coordinates of each shape marked under the screen coordinate system of the current frozen picture can be obtained easily by a user. And storing the position coordinates, and waiting for the user to finish marking for the next processing.

According to the ratio of the screen coordinate system to the pixel coordinate system of the frozen frame, the position (x) of the labeling coordinate under the pixel coordinate system is obtained _p , y _p ). The following formula is given:

（11）

wherein f _w (frame width) and f _h (frameheight) is the width and height of the current frozen frame resolution, x, y are the coordinates of the annotation point under the screen coordinate system.

(2) Frozen frame annotation data preprocessing

Since the image coordinate system in the video frame in the annotation data transmitted by the mixed reality head display uses the upper left corner as the initial origin (0, 0), and the texture coordinate system in the development environment (Unity) used uses the lower left corner as the origin (0, 0), the two image coordinate systems are not matched, and if the video frame is directly rendered on the user interface, the result of reverse image inversion occurs. It is necessary to perform a horizontal and vertical flipping process on the image. The conventional inversion process is to perform a position exchange on the original one-dimensional byte array of the video frame.

The horizontal inversion is processed as follows (pseudo code):

Procedure ImageHorizontalMirror(imageBytes: Array of Byte)

PixelSize := 4

width := resolution.width

height := resolution.height

Line := width * PixelSize

For i := 0 to height - 1

For j := 0 to (Line / 2) - 1 Step PixelSize

Swap(imageBytes[Line * i + j], imageBytes[Line * i + Line - j - PixelSize])

Swap(imageBytes[Line * i + j + 1], imageBytes[Line * i + Line - j - PixelSize + 1])

Swap(imageBytes[Line * i + j + 2], imageBytes[Line * i + Line - j - PixelSize + 2])

Swap(imageBytes[Line * i + j + 3], imageBytes[Line * i + Line - j - PixelSize + 3])

End For

End Procedur。

the following process (pseudo code) is applied to the vertical inversion:

Procedure ImageVerticalMirror(imageBytes: Array of Byte)

PixelSize := 4

width := resolution.width

height := resolution.height

Line := width * PixelSize

For i := 0 to width - 1

For j := 0 to height / 2 - 1

Swap(imageBytes[Line * j + i * PixelSize], imageBytes[Line * (height - j - 1) + i * PixelSize])

Swap(imageBytes[Line * j + i * PixelSize + 1], imageBytes[Line * (height - j - 1) + i * PixelSize + 1])

Swap(imageBytes[Line * j + i * PixelSize + 2], imageBytes[Line * (height - j - 1) + i * PixelSize + 2])

Swap(imageBytes[Line * j + i * PixelSize + 3], imageBytes[Line * (height - j - 1) + i * PixelSize + 3])

End For

End Procedure。

wherein the pseudo code of the swap function is:

Procedure Swap(Var lhs, Var rhs)

temp := lhs

lhs := rhs

rhs := temp

End Procedure。

through tests, the method has low performance consumption on the mixed reality head display in space labeling and is limited in daily use scenes.

By means of a mature development tool (Unity), the three-dimensional vector of the zoom size of the rendering component of the video frame is modified, a result of turning over the coordinate system can be achieved, and correct rendering of the video frame can be achieved efficiently. Let the scaled three-dimensional vector of the rendering component be S (scale), then the resulting rendering component needs to be correct S ^’ The following formula is given:

（12）

wherein lambda is ₁ And lambda (lambda) ₂ Represents the horizontal and vertical scaling factors, lambda, respectively ₁ And lambda (lambda) ₂ Negative values indicate inversion of the coordinate system in the horizontal direction and the vertical direction, respectively. Through the operation, the problem of unmatched image coordinate systems is solved.

In addition to the problem of mismatch in the video frame image coordinate system in the annotation data, the projection matrix calculated before it is also required to perform mirroring. Projection matrix P to be processed and projection matrix P to be obtained ^’ The following formula is provided:

（13）

wherein θ _1, θ ₂ 1 or-1, which represents flipping horizontally or vertically.

The correct projection matrix is obtained through the formula 13 and is used for the calculation of the next labeling algorithm.

(3) The annotated pixel coordinate system is converted into cropping space.

The pixel coordinates are converted to a coordinate system in the clipping space for subsequent solving of the corresponding coordinates in the mixed real world coordinate system.

For pixel points (x, y) marked with frozen frames, the width w and the length h of the resolution of the frozen frames have the following conversion formulas:

（14）

then (x ', y') is the coordinates of the coordinate system in the determined cropping space.

(4) Extracting matrix coefficients and normalizing coordinates.

In perspective projection, the projection matrix P is generally expressed as follows:

（15）

wherein n is the distance from the near clipping plane to the camera position; f is the distance from the far clipping plane to the camera position; i is the cutting plane position near the left side of the cutting plane; r is the cutting plane position near the right side of the cutting plane; t is the clipping plane position near the top of the clipping plane; b is the clipping plane position near the bottom of the clipping plane.

The projection matrix contains information such as focal length, image center point and the like, the data information needs to be extracted in the step and used as calculation data to solve the coordinate of the labeling position under the Hollolens 2 coordinate system.

For focal length f _x And f _y Image center point c _x And c _y The normalized center point c can be extracted in the projection matrix P using the following formula ^’ _x And c ^’ _y 。

（16）

（17）

In mixed reality head displays, the image center point coordinates typically need to be normalized. For c _x And c _y There is the following normalization method in the projection matrix P.

（18）

Wherein alpha is a normalization factor, which is represented by P in the projection matrix P ₃₃ And (5) determining.

(5) The direction vector of the labeling position is calculated and then converted into world coordinates.

According to the extracted parameters, the direction vector of the labeling point needs to be calculated. The direction of the direction vector is in camera space with the optical center pointing towards the pixel coordinates. For coordinates (x, y) of the annotation point in clipping space, the calculated direction vector d (x _d , y _d , z _d ) And the projection matrix P have the following calculation formula.

（19）

Wherein f _x And f _y Is the focal length; alpha is a normalization factor, which is represented by P in a projection matrix P ₃₃ And (5) determining.

The direction vector from the optical center to the point of interest in the camera space is solved by equation 19.

This step is to convert the labeling coordinates in the camera space coordinate system to coordinates corresponding to the world space coordinate system.

For the corresponding direction vector d (x _d , y _d , z _d ) And the view matrix V (ViewMatrix) of the current corresponding frozen frame, the world coordinates p (x, y, z) of the annotation point can be calculated by the following formula.

（20）/>

The coordinate position of the mark point in the world coordinate can be obtained by the formula 20. It is noted that the coordinates of the annotation point are free of depth information during the algorithm calculation, and additional means are needed later to determine the depth information of the current annotation point.

After the world coordinates of all the current annotation points are calculated, the calculation results are returned to the mixed reality head display through the server, and the rendering processing of the annotation points is carried out.

4. And the mixed reality head display receives the labeling coordinates, corrects the labeling coordinates and obtains the final position of the labeling information.

(1) And obtaining the three-dimensional coordinates of the marked points, and scanning the space geometric grid.

The mixing device has a spatial perception capability that enables the perception of the environment of the display world using a spatially-aware mixed reality head display, as shown in fig. 6.

Through space perception, the device is mixed to construct a finer three-dimensional geometric model of the current labeling scene, and particularly, geometric three-dimensional construction is carried out on an object to be labeled. This is critical for subsequent determination of depth information of the annotation point using radiation detection. With the three-dimensional geometric grid of the current scene, rays collide.

(2) And (5) detecting rays to determine the depth information of the marked point.

Because the three-dimensional coordinates of the marking points lack depth information, if the marking points are directly rendered, visual effects of near, far and small sizes can appear, and marking positions cannot be accurately placed near the marked objects, so that the marking effects are extremely poor, and further improvement on the marking points is needed.

After the geometric model of the current scene is constructed, the ray detection technology is carried out on the graph to be marked currently according to the marking coordinates and the marking graph.

The ray detection technology is a common 3D program development technology for detecting collisions, interactions, object capturing, etc. in the game world. Ray detection simulates interactions between rays and objects by projecting virtual rays to determine if and where the rays intersect with the object.

Thanks to the existing mature 3D engine (Unity), the ray detection implementation only needs to invoke its relevant interfaces. Different ray detection strategies are adopted for different labeling patterns.

For labeling rectangular frames, in order to ensure that the depth information of an object to be labeled can be correctly acquired, a center point of the rectangular frame is taken as a ray detection center, and collision points of rays and the object to be labeled are acquired.

For circular labeling, in order to ensure that the depth information of a labeled object is correctly acquired, a circle center is used as a center of ray detection, and collision points of rays and an object to be labeled are acquired.

For straight line marking, detecting rays of each marking point on the straight line, and acquiring collision points of the rays and the object to be marked.

Through the operation, after the information of the collision point is obtained, a reasonable marking point position can be found through correcting the marking position.

(3) And correcting the labeling position, and rendering and displaying.

Acquisition of collision Point P _h And then, using different correction strategies according to the labeling graph.

For the rectangular labeling, the rectangular position is corrected as follows.

Calculate the collision point P _h (x _h , y _h , z _h ) To the length L of a certain vertex line segment of the corrected labeling rectangle, the formula is as follows:

（21）/>

wherein P is _c (x _c , y _c , z _c ) The rectangular center point is obtained by the formula 22, P _f (x _f , y _f , z _f ) To mix the position of the real helmet in the world coordinate system at the moment of freezing the frame, P _a (x _a , y _a , z _a ) Some rectangular vertex of the coordinates needs to be corrected.

In formula 22, P ₁ (x ₁ , y ₁ , z ₁ ) And P ₂ (x ₂ , y ₂ , z ₂ ) For two diagonal vertices of the rectangle to be modified, P _c (x _c , y _c , z _c ) Is the rectangular center point to be corrected.

（22）

Correcting the depth information of the marked point of the rectangle, and regarding the center point P of the rectangle _c (x _c , y _c , z _c ) Point of impact P _h (x _h , y _h , z _h ) And the rectangular vertex coordinates P to be corrected _a (x _a , y _a , z _a ) The modified rectangular annotation point P (x, y, z) is required, and has the following formula:

（23）

the current correction only ensures that the current rectangular frame position is matched with the marked object, but the marked frame is bigger or smaller due to the visual effect of near-far-small caused by the fact that the marked algorithm lacks depth information when solving, and the marked frame is required to be scaled.

For the collision point P _h (x _h , y _h , z _h ) And the coordinates P (x, y, z) of the vertex of the rectangle of the corrected position, the coordinates P of the vertex of a certain rectangular frame of the final corrected size and position are required by the line segment length L of the formula 21 _r (x _r , y _r , z _r ) The formula is as follows:

（24）

then P _r (x _r , y _r , z _r ) And (5) obtaining a final coordinate point of the rectangular frame to be solved.

For the circle marking, the correction marking position is obtained by adopting the following method.

P transmitted by server _c (x _c , y _c , z _c ) To determine the three-dimensional coordinates of the center of the circle to be corrected, and the radius r is used for determining the radius of the circle.

Calculate the collision point P _h (x _h , y _h , z _h ) To a modified circleThe length L can be referred to as formula 21, where P _f (x _f , y _f , z _f ) To mix the position of the real helmet in the world coordinate system at the moment of freezing the frame, P _a (x _a , y _a , z _a ) A point on a circle of coordinates needs to be corrected.

The obtained length L is the circle center of the corrected circular mark. The three-dimensional coordinates P (x, y, z) for a point on the corrected circle can be found by the following formula.

（25）

And for the marking of an irregular free curve, sampling each point on the line segment, taking a frozen point as a starting point, transmitting rays to the frozen point, and enabling the collision point to be the point to be rendered.

5. And the mixed reality head display finishes positioning rendering of the virtual information according to the final position of the labeling information.

Rendering of various annotation shapes needs to be done with the lineReneder component in units. And finally, displaying the labeling effect.

After rendering is completed, a wearer of the mixed reality head display can see the marking graph at the marking position, and the positionable camera of the mixed reality head display has the function of virtual and real picture transmission, so that marking results are shared to background marking staff in real time, and the effect that the back-end marking staff and the front-end wearer share screen pictures is realized.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The space labeling method based on mixed reality is characterized by comprising the following steps of:

the mixed reality head display finishes positioning rendering of virtual information according to the final position of the labeling information;

the video frame distortion correction, calculate projection matrix and view matrix of each video frame, form the annotation data, include:

according to the locatable camera of the mixed reality head display, camera internal reference information of each video frame is returned, and distortion in the image is corrected by using a distortion correction algorithm;

the correcting distortion in an image using a distortion correction algorithm includes:

（1）

wherein, (u, v) is two-dimensional coordinates on the image plane, (-)x, y, z) Is a three-dimensional coordinate in a camera coordinate system, (f) _x , f _y ) Is the focal length of the camera, (c) _x , c _y ) Is the camera optical center coordinates;

（2）

（3）

wherein, the method comprises the following steps of,/>) Is the two-dimensional coordinates, k, of the image after distortion correction ₁ ,k ₂ ,k ₃ Is the radial distortion coefficient of the camera, p ₁ ,p ₂ Is the tangential distortion coefficient of the camera, and +.>；

（4）

（5）

wherein T is ₃ A projection matrix of a positionable camera for a mixed reality head display;

the marking of the frozen picture comprises the following steps:

according to the projection matrix and the view matrix of the current virtual camera, obtaining the position coordinate P of the current user interface label rendering _b ：

（10）

Wherein V is _n And P _n The method comprises the steps that a view matrix and a projection matrix of a current virtual camera are respectively, x, y are coordinate points to be calculated on a screen, and width and height are the width and length of the resolution of the current camera;

solving the position of the current user labeling content in the user interface, fixing the current user labeling content to a position with a depth of 1m from the z-axis of the camera, and rendering and displaying the current user interface labeling content;

the solving the three-dimensional coordinates by using a space labeling algorithm comprises the following steps:

(1) According to the position of the coordinate of the marked picture under the screen coordinate system, solving (x) under the pixel coordinate corresponding to the frozen frame _p , y _p ) Position:

（11）

wherein f _w And f _h X is the width and height of the resolution of the current frozen frame in the screen coordinate system _t 、y _t Marking point coordinates under a screen coordinate system, wherein w and h are the width and the height of the resolution of a frozen frame under corresponding pixel coordinates;

（12）

wherein S is a scaled three-dimensional vector of the rendering component, S ^’ Lambda is the flipped rendering component ₁ And lambda (lambda) ₂ Respectively represent the level andvertical scaling factor lambda ₁ And lambda (lambda) ₂ When negative, the inversion of the coordinate system occurs in the horizontal direction and the vertical direction respectively;

(3) Mirror image processing is carried out on the projection matrix:

（13）

wherein P is _m For the projection matrix to be processed, P ^’ Is a projection matrix after mirror image, theta _1, θ ₂ 1 or-1, representing turning over in the horizontal or vertical direction;

(4) Converting the marked pixel coordinate system into a clipping space;

converting the pixel coordinates to a coordinate system in clipping space:

（14）

then (x ', y') is the coordinates of the coordinate system in the determined cropping space, (x) _q , y _q ) To label the pixel point of the frozen frame, w _f And h _f The width and length of the frozen frame resolution;

the projection matrix P is expressed as:

（15）

wherein n is the distance from the near clipping plane to the camera position; f is the distance from the far clipping plane to the camera position; i is the position of the clipping plane near the left side of the clipping plane; r is the cutting plane position near the right side of the cutting plane; t is the clipping plane position near the top of the clipping plane; b is the clipping plane position near the clipping plane bottom;

（16）

（17）

（18）

for coordinates of the annotation point in crop space (x _e , y _e ) Calculate the direction vector d (x _d , y _d , z _d )：

（19）

for the corresponding direction vector d (x _d , y _d , z _d ) And view matrix V of the current corresponding frozen frame _i×j Calculating world coordinates p of the marked points _w (x _w , y _w , z _w )：

（20）

Wherein V is _ij For view matrix V _i×j I=3, j=4;

(8) Returning the world coordinates of all the current annotation points to the mixed reality head display, and performing rendering treatment of the annotation points;

the mixed reality head display receives the labeling coordinates, corrects the labeling coordinates, and obtains the final position of labeling information, and the method comprises the following steps:

after the information of the collision points is obtained, a reasonable marking point position is found through correcting the marking position, and rendering and displaying are carried out;

finding a reasonable marking point position by correcting the marking position comprises the following steps:

（21）

（22）

wherein P is _f (x _f , y _f , z _f ) To mix the position of the real helmet in the world coordinate system at the moment of freezing the frame, P _a (x _a , y _a , z _a ) Some rectangular vertex needing to correct coordinates, P ₁ (x ₁ , y ₁ , z ₁ ) And P ₂ (x ₂ , y ₂ , z ₂ ) For two diagonal vertices of the rectangle to be modified, P _c (x _c , y _c , z _c ) Is a rectangular center point to be corrected;

correcting the depth information of the marked point of the rectangle, and regarding the center point P of the rectangle _c (x _c , y _c , z _c ) Point of impact P _h (x _h , y _h , z _h ) And the rectangular vertex coordinates P to be corrected _a (x _a , y _a , z _a ) Calculating the corrected rectangular vertex coordinates P _t (x _t ,y _t ,z _t )：

（23）

For the collision point P _h (x _h , y _h , z _h ) And rectangular vertex coordinates P of the corrected position _t (x _t ,y _t ,z _t ) Calculating the coordinates P of the vertex of a certain rectangular frame with the final correction size and position according to the length L of the line segment _r (x _r , y _r , z _r )：

（24）

determining the three-dimensional coordinates of the circle center to be corrected by the circle center, and determining the radius of the circle by the radius r;

the obtained length L is the circle center of the corrected circle mark, and the three-dimensional coordinate P of one point on the corrected circle _k (x _k , y _k , z _k ) The method comprises the following steps:

（25）

2. The mixed reality based spatial annotation method of claim 1, wherein the initiating the remote annotation function, initializing the mixed reality head-mounted positionable camera, comprises:

initializing a mixed reality head display positionable camera, wherein the front end of the mixed reality head display positionable camera is provided with a camera facing to the external environment, and the mixed reality head display is provided with a remote labeling application program for accessing and controlling the camera;

3. The mixed reality based spatial annotation method of claim 1, wherein the annotation data comprises: the video resolution of the current frame, the video frame rate, the projection and view matrices of the camera, and camera references including optical center, focal length, radial distortion, and tangential distortion.

4. The mixed reality based spatial labeling method of claim 1, wherein labeling rectangular, circular or free line pictures comprises:

labeling a rectangular picture:

（6）

labeling a circular picture:

with the current left mouse click at the point of depression (x ₅ , y ₅ ) As a starting point, a left mouse button lifting point (x ₆ , y ₆ ) Is the end point;

calculate the circle center o (x) _o ,y _o ) And radius r:

（7）

（8）

on a calculation circlePoint P _s (x _s , y _s )：

（9）

Where d=0, 1, …,360 represents a point on each degree of sample circle;

labeling the free line picture: