CN114143530A

CN114143530A - Augmented reality remote guidance method, device and storage medium

Info

Publication number: CN114143530A
Application number: CN202111507489.8A
Authority: CN
Inventors: 齐越; 张瑞韩; 王君义; 高连生; 李弘毅
Original assignee: Shenzhen Beihang Emerging Industrial Technology Research Institute; Beihang University
Current assignee: Shenzhen Beihang Emerging Industrial Technology Research Institute; Beihang University
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2022-03-04

Abstract

The application provides an augmented reality remote instruction method, equipment and a storage medium. The method comprises the following steps: acquiring a source image of an object to be assisted, and sending the source image to an assisting end device; carrying out pose estimation on the source image and sending a pose estimation result to the assisting end equipment; receiving remote guidance data sent by the helper device; and generating an AR image of the object to be assisted according to the source image, the pose estimation result and the remote guidance data, so that the assisted end user obtains objective and effective guidance, and the remote guidance efficiency is improved.

Description

Augmented reality remote guidance method, device and storage medium

Technical Field

The present application relates to the field of augmented reality technologies, and in particular, to a method, device, and storage medium for augmented reality remote guidance.

Background

The core of the augmented reality remote guidance technology is that an expert can check the working space of a remote user, the communication between the remote expert and field workers is enhanced, and environmental information is shared to coordinate and help the user, so that more intuitive information exchange between participants of the two parties is realized.

In the existing augmented reality remote guidance technology, a helped person transmits a synthesized AR image to a remote assistor. When the assistor sends the guidance information to the helped person, the helped person needs to acquire the guidance information by means of a display device outside the task because the guidance information cannot be displayed on the AR image of the helped person.

During the actual operation process, the attention of the helped person can be constantly switched between the task and the external display device, so that the efficiency of remote guidance is low.

Disclosure of Invention

The application provides an augmented reality remote guidance method, equipment and a storage medium, which are used for solving the problem of low remote guidance efficiency.

In a first aspect, the present application provides an augmented reality remote guidance method, including:

acquiring a source image of an object to be assisted, and sending the source image to an assisting end device;

carrying out pose estimation on the source image and sending a pose estimation result to the assisting end equipment;

receiving remote guidance data sent by the assistance end equipment;

and generating an AR image of the object to be assisted according to the source image, the pose estimation result and the remote guidance data.

In a second aspect, the present application provides an augmented reality remote guidance method, including:

receiving a source image and a pose estimation result of an object to be assisted, which are sent by assisted end equipment;

generating an AR image of an object to be assisted according to a source image and a pose estimation result;

and receiving remote guidance data of the AR image from the user, and sending the remote guidance data to the assisted end equipment.

In a third aspect, the present application provides an augmented reality remote guidance apparatus, comprising: a processor, a memory, the memory storing code therein, the processor executing the code stored in the memory to perform the augmented reality remote guidance method of the first aspect.

In a fourth aspect, the present application provides an augmented reality remote guidance apparatus, comprising: a processor, a memory, the memory storing code therein, the processor executing the code stored in the memory to perform the augmented reality remote guidance method of the second aspect.

In a fifth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the augmented reality remote guidance method according to any one of the first aspect when executed by a processor.

In a sixth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the augmented reality remote guidance method according to any one of the second aspect when executed by a processor.

According to the augmented reality remote guidance method, the assisted end equipment acquires the source image of the object to be assisted and sends the source image to the assisting end equipment. And according to the source image, the assisted end equipment carries out pose estimation on the object to be assisted, and simultaneously sends a pose estimation result to the assisting end equipment. After the assistant end user adds the remote guidance data, the assistant end device can receive the remote guidance data sent by the assistant end device, and generates an AR image of the object to be assisted according to the source image, the pose estimation result and the remote guidance data. Because the remote guidance data can be displayed in the generated AR image, the user can be intuitively and effectively guided through the aided end equipment, and the remote guidance efficiency is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a scene schematic diagram of augmented reality remote guidance provided in an embodiment of the present application;

fig. 2 is a flowchart of a first augmented reality remote guidance method provided in an embodiment of the present application;

fig. 3 is a flowchart of a remote guidance method for augmented reality according to an embodiment of the present application;

fig. 4 is a flow chart of a method for augmented reality remote guidance provided in the embodiment of the present application;

fig. 5 is a schematic diagram of an augmented reality remote guidance method provided in an embodiment of the present application;

fig. 6 is a schematic diagram of a quadtree structure of coding blocks and transform blocks according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a filter processing slice boundaries according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a reprojection error according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram of a principle of a reliable UDP protocol according to an embodiment of the present application;

fig. 10 is a schematic diagram of a assistant-end and assistant-end double-ended images captured by an assistant end according to an embodiment of the present application;

fig. 11 is a schematic diagram of an augmented reality remote guidance device provided in an embodiment of the present application;

fig. 12 is a schematic diagram of an augmented reality remote guidance device provided in an embodiment of the present application;

fig. 13 is a schematic diagram of an augmented reality remote guidance device third provided in the embodiment of the present application;

fig. 14 is a fourth schematic view of an augmented reality remote guidance device provided in an embodiment of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terms referred to in this application are explained first:

augmented Reality (AR) technology: the method is characterized in that a virtual image generated by a computer is fused with image information of a physical environment to generate an augmented reality environment, so that the experience of liveness, vividness and controllable interaction is provided for a user. The AR consists of three basic systems: virtual-real fusion, real-time interaction and three-dimensional registration. The virtual-real fusion is to make the virtual environment appear as a part of the real environment through technologies such as illumination estimation and the like so as to strengthen the experience of the user on the scene; the real-time interaction means that the AR system can provide natural and visual feedback for the behavior of the user in real time, and the immersion feeling of the user is not damaged while the requirement is met; the aim of three-dimensional registration is to align a virtual environment to a real environment, and when the visual angle of a user changes, a rendered virtual image is still matched with a real image, so that the displayed enhanced information is not easy to cause misunderstanding of the user.

Video coding techniques: the method is characterized in that information is compressed by utilizing the similarity of continuous images in frames and between frames so as to meet the limitation of network bandwidth. H.264/MPEG-4AVC has become a digital video implementation technology in almost all uncovered areas after H.262/MPEG-2video, and has substantially replaced the old standard in its existing application area. It is widely used in many applications including the playing of high definition television signals over satellite, cable and terrestrial transmission systems, video content acquisition and editing systems, networked and mobile network video, and real-time conversational applications, among others. However, the demand for higher quality and resolution is insufficient under the current network conditions, and applications such as VR/AR have a larger visual range than the conventional video applications, and 8K, 16K and even higher resolution can effectively enhance the user experience. High Efficiency Video Coding (HEVC) is mainly used to solve the problems of low Video quality and insufficient compression ratio of h.264/MPEG-4AVC, and is particularly concerned with improving Video resolution and increasing the utilization rate of a parallel processing architecture.

An augmented reality remote guidance system belongs to the field of distributed multi-user augmented reality Collaboration (AR Collaboration), and is mainly characterized in that a helper user can check a working space of a remote user and share environment information to coordinate and help a helped end user. In various fields such as industrial maintenance, remote consultation, interactive education and the like, users gradually increase the demand for remote guidance technology. In the conventional remote guidance technology, the aided end user needs to obtain the information of the aided end user by means of a display outside the object to be assisted, and the aided end user continuously exchanges an observation object between the object to be assisted and an external display device, so that the remote guidance efficiency is low.

The application provides an augmented reality remote guidance method, wherein assisted end equipment acquires a source image of an object to be assisted, carries out pose estimation on the object to be assisted based on the source image, and then respectively sends the source image and a pose estimation result to the assisted end equipment. When the assisting end equipment generates an AR image and receives remote guidance data of an assisting end user to the AR image, the assisted end equipment receives the remote guidance data sent by the assisting end equipment and generates the AR image of the object to be assisted according to the source image, the pose estimation result and the remote guidance data. Because the remote guidance data is displayed in the AR image, the aided end user can see the object to be assisted and the guidance data through aided end equipment to obtain visual and effective guidance, thereby improving the remote guidance efficiency.

Fig. 1 is a scene schematic diagram of augmented reality remote guidance provided in an embodiment of the present application, and as shown in fig. 1, the augmented reality remote guidance relates to a helped end and an assisting end, where a helped end user can communicate with an assisting end user by using a convenient and fast mobile device, and the assisting end can input accurate guidance data by using operations of hardware such as a mouse and a keyboard in a PC device and send the guidance data to the helped end device, so that the helped end user can be intuitively and effectively guided.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 2 is a flowchart of a first augmented reality remote guidance method provided in an embodiment of the present application, where an execution subject of the method in this embodiment may be a assisted end device, and the specific details are as follows:

s201: and acquiring a source image of the object to be assisted, and sending the source image to the assistance end equipment.

The assisted device may be a mobile device including a camera, including but not limited to a mobile phone, a tablet computer, and the like, and the camera may be used to obtain a source image of the object to be assisted. The object to be assisted can be equipment which needs to be used for learning, and can also be equipment which needs to solve faults.

The assistant end device can be a PC device and can mark the source image. Because the PC equipment can be operated by hardware such as a keyboard and a mouse, the accuracy of marking the source image is improved.

S202: and carrying out pose estimation on the source image, and sending a pose estimation result to the assistant end equipment.

And the assisted end equipment can estimate the pose of the object to be assisted according to the source image. Wherein, the pose may be the position and the posture of the object to be assisted.

S203: and receiving the remote guidance data sent by the helper device.

The remote guidance data is data for assisting the end user to mark the source image and guiding the assisted end user, and may include indication information, annotation information, and the like.

Optionally, when data transmission is remotely guided, a reliable UDP protocol may be used for transmission to ensure the delay and quality of communication.

S204: and generating an AR image of the object to be assisted according to the source image, the pose estimation result and the remote guidance data.

After the aided end equipment receives the remote guidance data, an AR image can be generated according to the obtained source image of the object to be assisted, the pose estimation result and the remote guidance data, so that a aided end user can be intuitively and effectively guided by watching the AR image.

According to the augmented reality remote guidance method provided by the embodiment of the application, the assisted end equipment acquires the source image of the object to be assisted and carries out pose estimation on the object to be assisted according to the source image. In order to enable the assisting end equipment to generate the AR image, the assisted end equipment respectively sends the source image and the pose estimation result to the assisting end equipment. And the assisted end equipment receives the remote guidance data sent by the assisting end equipment and generates an AR image of the object to be assisted according to the source image, the pose estimation result and the remote guidance data. Because the AR image contains the information of the object to be assisted and the remote guidance data, the aided end user can obtain objective and effective guidance by watching the AR image, and the efficiency of remote guidance is improved.

On the basis of the above embodiments, a specific embodiment is provided below, which describes in detail the process of augmented reality remote guidance of the assisted device.

Fig. 3 is a flowchart of a method for augmented reality remote guidance according to an embodiment of the present application, where an execution subject of the method according to the embodiment may be a assisted end device, and the method specifically includes the following steps:

s301: the method comprises the steps of obtaining a source image of an object to be assisted, encoding the source image, and sending encoded source image data to an assisting end device.

Because a larger network bandwidth is needed for transmitting the image sequence frame, the obtained source image can be encoded, and the source image is sent to the assistant end equipment in a video stream mode.

Alternatively, the video stream may be sent to the server by using an RTMP protocol, and the server sends the video stream to the helper device by using an HTTP protocol.

S302: and carrying out pose estimation on the source image, and sending a pose estimation result to the assistant end equipment.

According to the source image of the object to be assisted, the assisted end equipment can estimate the pose of the object to be assisted. Wherein, the pose can be estimated based on ORB-SLAM2, and the pose comprises the information of the position and the posture of the object to be assisted.

And the assisted end equipment sends the pose estimation result to the assisting end equipment so as to enable the assisting end equipment to generate an AR image.

S303: and receiving the remote guidance data sent by the helper device.

The remote guidance data may include instructional information, annotation information, and the like. Alternatively, the remote guidance data may be transmitted via the trusted UDP protocol.

S304: and generating a virtual image according to the pose estimation result.

Before generating the AR image, the aided end equipment generates a virtual image according to the pose estimation result of the object to be assisted.

S305: and generating an AR image of the object to be assisted according to the source image, the virtual image and the remote guidance data.

And the assisted end equipment generates an AR image according to the acquired source image, the virtual image and the received remote guidance data. Because the AR image contains the remote guidance data, the aided end user can be intuitively and effectively guided.

According to the augmented reality remote guidance method, after the assisted end equipment acquires the source image of the image to be assisted, the source image can be coded in order to reduce the required network bandwidth, and the coded source image data is sent to the assisted end equipment in a video stream mode. Meanwhile, according to the source image, the auxiliary end equipment carries out pose estimation on the object to be assisted and sends a pose estimation result to the auxiliary end equipment. And after the assisting end user conducts remote guidance, the assisted end equipment receives the remote guidance data sent by the assisting end equipment. And generating a virtual image according to the pose estimation result, and finally generating an AR image of the object to be assisted according to the source image, the virtual image and the remote guidance data. Because the generated AR image contains the remote guidance data and the information of the object to be assisted, the assisted end user can be intuitively and effectively guided through the assisted end equipment, and the remote guidance efficiency is improved.

The above embodiment provides a process of implementing remote guidance when the execution subject is the assisted end device. The following provides an embodiment, and describes a process in which the execution subject implements remote guidance for the assistance end device.

Fig. 4 is a flowchart of a third augmented reality remote guidance method provided in the embodiment of the present application, where an execution subject of the method in the embodiment is an assistant device. As shown in the figure, the method of the present embodiment may include:

s401: and receiving a source image and a pose estimation result of the object to be assisted, which are sent by the assisted end equipment.

After receiving the source image of the object to be assisted by the assisted end device, the assisting end device can decode the source image.

The assistant end device can be a PC device and can mark the source image. Because the PC equipment can utilize hardware such as a keyboard and a mouse to operate the source image, the accuracy of marking the source image can be improved.

S402: and generating an AR image of the object to be assisted according to the source image and the pose estimation result.

Optionally, the assistant end device may generate a virtual image according to the pose estimation result; and then generating an AR image of the object to be assisted according to the source image and the virtual image.

It should be noted that, compared with the case that the assisting end device receives the AR image generated by the assisted end device, in the application, the assisting end device receives the source image and the pose estimation result sent by the assisted end device, so that the assisting end device locally generates the AR image, the operation on the AR image can be richer, and the guidance is more effective.

S403: and receiving remote guidance data of the AR image from the user, and sending the remote guidance data to the assisted end equipment.

After the aided end device generates the AR image, the aided end user can add remote guidance data to the AR image. And after receiving the remote guidance data, the aided end equipment sends the remote guidance data to the aided end for guiding the aided end user.

Optionally, the remote guidance data may be transmitted through a reliable UDP protocol, so as to ensure the delay and quality of communication.

According to the augmented reality remote guidance method provided by the embodiment of the application, after the assisting end equipment receives the source image and the pose estimation result of the object to be assisted, which are sent by the assisted end equipment, the assisting end equipment generates the AR image of the object to be assisted according to the source image and the pose estimation result. After the AR image is generated, the assisting end user can add remote guidance data to the AR image, and after receiving the remote guidance data of the assisting end user to the AR image, the assisting end device sends the remote guidance data to the assisted end device, so that the assisted end user can be intuitively and effectively guided, and the remote guidance efficiency is improved.

On the basis of the above embodiments, a specific embodiment is provided below, which describes in detail the interaction process between the assisted end device and the assisting end device.

Fig. 5 is a schematic diagram of an augmented reality remote guidance method provided in an embodiment of the present application, and as shown in the drawing, the method of the embodiment is specifically as follows:

s501: and the assisted terminal equipment acquires a source image of the object to be assisted, encodes the source image into a video stream and pushes the video stream to the server.

The assisted end device can be a mobile device with a camera, including but not limited to a mobile phone, a tablet computer and the like. The portability of mobile devices allows assisted end devices to be applied in a variety of scenarios.

And the source image is an image obtained by the assisted terminal equipment through the camera. Since the transmission of image sequence frames is not affordable by the bandwidth in most current network environments. For example, a raw captured image in 640 × 480 resolution YUV422 format, is about 300KB in size. At 30 frames per second, the network bandwidth is about 8.8MB/s, while with video streaming, the bandwidth is about 400KB/s with a delay of about 0.5 s. Therefore, the source image frames can be transmitted by encoding the source image frames into a Video stream through High Efficiency Video Coding (HEVC) h.265. The video coding technology mainly comprises five parts of contents of block division, intra-frame prediction, inter-frame prediction, transformation and filtering, and specifically comprises the following steps:

block division: HEVC retains the basic hybrid Coding architecture of previous video Coding standards, such as h.264/AVC, while a significant difference is that a more adaptive Coding Tree Unit (CTU) -based quadtree structure is used instead of macroblocks. In principle, the quadtree coding structure is described by blocks and units. A block defines a set of samples and their sizes, while a unit encapsulates a luminance and corresponding chrominance block and the syntax needed to encode these blocks. Thus, the CTU includes a Coding Tree Block (CTB) and syntax specifying the coded data and further subdivision. This subdivision results in Coding Units (CUs) leaving Coding Blocks (CB). Each coding Unit CU contains more entities for Prediction purposes, so-called Prediction Units (PUs), and for Transform purposes, so-called Transform Units (TUs). Similarly, each coding Block CB is divided into a Prediction Block (PB) and a Transform Block (TB), as shown in fig. 6, and fig. 6 is a schematic diagram of a quad-tree structure of the coding Block and the Transform Block. This variable size adaptive approach is suitable for larger resolutions, e.g. 4k × 2k, which is the target resolution for some HEVC applications.

Intra-prediction refers to predicting samples from reconstructed samples of neighboring blocks. The mode class remains the same in DC (Direct Coefficient), planar, horizontal/vertical and direction. One significant variation comes from the introduction of larger block sizes, where intra picture prediction using one of 35 modes can be performed on blocks up to 32 x 32 samples in size. The minimum block size is the same as 4 x 4 H.264/AVC. For DC, horizontal and vertical modes, additional post-processing is defined in HEVC, where rows and/or columns are filtered to maintain continuity across block boundaries. The angular mode of HEVC is more complex than the directional h.264/AVC mode. Each prediction sample is calculated as

((32-ω)·x_i+ω·x_i+1+16)＞＞5

Wherein x is_iIs a reference sample and ω is a weighting factor.

Inter-frame prediction of HEVC uses a separable 8-Tap (8-Tap) filter for luma sub-pixel positions, resulting in an increase in memory bandwidth and the number of multiply-accumulate operations required for motion compensation. The filter coefficients are limited to a 7-bit signed range to minimize hardware cost. In this application, motion compensation for an nxn block consists of 8+56/N times 8-bit multiply-accumulate operations per sample and 8 times 16-bit multiply-accumulate operations per sample. For chroma sub-pixel positions, a separable 4-tap filter with the same constraints as the luma filter coefficients may be applied.

Historical tile information may be multiplexed by transforms, which define transforms (4 × 4, 8 × 8, 16 × 16, and 32 × 32 in size) as simple fixed-point matrix multiplications. The matrix multiplication of the inverse transformed vertical and horizontal components is as follows:

Y＝s(C^T·T)

R＝Y^T·T

where s () is a scaling and saturation function, the value of Y can be represented using 16 bits. Each factor in the transformation matrix T is represented using a signed 8-bit number. The operation is defined as multiplying the 16-bit signed coefficient C by a factor, thus requiring an accumulation of more than 16 bits. Since these transforms are integer approximations of the discrete cosine transform, their symmetric properties are preserved. For the 4-point transform, an alternative transform that approximates a discrete sine transform is also defined.

HEVC may limit filtering to edges on an 8 x 8 grid such that the number of filter patterns that will need to be computed and the number of samples that may be filtered is reduced. At the same time, the order of edge processing is also modified to achieve parallel processing. A picture can be split into 8 x 8 blocks, which can all be processed in parallel. Since only the edges inside the blocks need to be filtered. The positions of these blocks are shown in fig. 7. When there are multiple slices, some of these blocks overlap with CTB boundaries and slice boundaries. This function can filter slice boundaries in any order without affecting the reconstructed picture.

After the video stream is obtained by the auxiliary device, the video stream needs to be pushed to the server in the cloud. Completing the video push stream requires selecting a video transport protocol including, but not limited to, UDP/TS, RTP, MMS, HLS, RTMP, and the like. The UDP/TS and RTP protocols are carried out based on UDP transmission protocol. Because the UDP protocol does not establish connection, the sequence of the data packets cannot be ensured, and no packet loss retransmission mechanism exists. Therefore, the transmission in the internet with dynamically changing bandwidth and high packet loss rate cannot guarantee QoS (Quality of Service). MMS and HLS protocols are typically used to provide a pull streaming mode of data to a player and cannot provide a push streaming mode of data to a server.

The video stream is pushed to the server using the RTMP protocol. The RTMP protocol is based on the TCP protocol and can also be established on the HTTP protocol, and the problems of multiplexing and packetization of multimedia data transmission streams are solved. The RTMP protocol is an application layer protocol, and relies on a transport layer protocol (usually TCP) to ensure the reliability of information transmission. On the basis of establishing the transport layer protocol link, the RTMP protocol establishes the RTMP link through the handshake of the client and the server, and realizes safe transmission through the control information transmitted on the link. Because the TCP protocol supports functions of packet loss retransmission and the like, the transmission safety can be guaranteed, and therefore, the RTMP protocol is selected to realize video stream plug flow, and the requirements of safe and reliable transmission can be met.

S502: and the assisted end equipment carries out pose estimation on the source image and synchronizes a pose estimation result to the server.

Based on the source image, the assisted end device can estimate the pose of the object to be assisted based on ORB-SLAM 2. The ORB-SLAM2 mainly comprises three subsystems of tracking, local mapping and closed loop detection, and the specific process of estimating the pose of the equipment is as follows:

the ORB-SLAM2 extracts ORB descriptors from an input source image frame and performs feature matching with ORB descriptors of a reference frame image, as shown by the line p1p2 in fig. 8, where Rf denotes a reference frame, which may be the last key frame Kf inserted in ORB-SLAM2, Cf denotes a current frame, p1 denotes pixels in the reference frame Rf, and p2 denotes ORB feature point positions in the current frame Cf matching with the reference frame Rf. The ORBSLAM2 searches for the three-dimensional coordinate P of P1 in the map, as shown by the line Pp1 in fig. 8, and projects P into the current frame Cf by using the external parameter T ═ R, T after the query is completed, as shown by the line Pp2' in fig. 4. As shown by the line Pp2 in fig. 8, P2 is the true position of P in the current frame Cf, and P2' is the actual projected position, and there is an error between the two. The position tracking of the camera can be achieved by optimizing the external parameter T ═ R, T to minimize the reprojection error between all feature points. The formula of the optimized external parameter T ═ R, T ] is as follows:

wherein { R, t } is a reprojection error of all matched feature point pairs of the current frame and the current frame, R is a rotation index, t is a displacement, ρ is the position of the current frame image, and ρ 2 is the position of the next frame image. Besides calculating the reprojection error, a kernel function is used to prevent the extraneous point from excessively influencing the nonlinear optimization. In addition, the formula also utilizes the information matrix to add weight information to different characteristic points.

In order to obtain the initial value of the attitude of the current frame, calculation can be carried out through the attitude and the motion model of the previous frame. When there is no motion model, ORBSLAM2 may estimate the relative motion of the camera in a way that tracks the reference frame. In ORBSLAM2, most feature points of the reference keyframe have corresponding three-dimensional points in the map. After the current frame is input, the two-dimensional feature matching relationship between the current frame and the reference key frame is determined through feature matching. Knowing the map three-dimensional points corresponding to the reference key frame feature points, the corresponding relation between the matched three-dimensional space points and the current frame feature points can be obtained, namely, a plurality of pairs of matched 3D-2D points exist, and the pose can be optimized by utilizing a PNP (passive-n-Point) method. After the ORB-SLAM2 succeeds in tracking the reference frame, the relative motion between two frames can be recorded as a motion model. In estimating the motion of the next frame, the pose and motion models of the previous frame are calculated. Alternatively, the pose of the previous frame may be multiplied by the motion model to obtain the initial pose value of the current frame. Assume that the camera pose of the previous frame is T_lwRepresenting the relative transformation from the world coordinate system to the camera coordinate system of the previous frame. The motion model is T_clAnd represents the relative transformation from the previous frame to the current frame. The initial value of the camera pose of the current frame can be described as:

T_cw＝T_clT_lw

in an implementation scenario, when a motion model already exists, the pose of the previous frame and the motion model are calculated, and the initial pose value of the current frame can also be obtained.

It should be noted that, in order to enable the assisting end to better interact with the virtual object, the assisting end sends the source image frame and the device pose estimation result to the assisting end respectively.

S503: and the server receives the video stream for data caching, and synchronizes the pose estimation result to the assistant end equipment.

The server receives the video stream from the assisted end equipment, then caches the data, encapsulates the data into an FLV format, and transmits the data to the assisting end through an HTTP protocol. Alternatively, the server may be a Nginx server. It should be noted that data may be transmitted in the form of packets, and a plurality of packets may constitute a video stream.

S504: and the assistant end equipment receives the video stream sent by the server and decodes the video stream into images.

The helper device may be a PC device, and decodes the received video stream to obtain an image for generating an AR image.

S505: and the assistant end equipment generates an AR image by using the pose estimation result and the image.

The assistant end device can generate a virtual image by using the pose estimation result, and generate an AR image according to the virtual image and the decoded image.

S506: and receiving remote guidance data of the assistant end user to the AR image, and sending the remote guidance data to the assisted end equipment.

When the assistant end equipment is the PC equipment, the PC equipment is provided with hardware such as a keyboard and a mouse, so that when the AR image is operated, the operation accuracy can be improved, and the remote guidance data is more accurate.

S507: and the assisted end equipment receives the remote guidance data sent by the assisting end equipment.

After the remote guidance data are synchronized through the data, the aided end equipment can obtain the remote guidance data. The information which needs to be synchronized between the assisted end equipment and the assisting end equipment can be transmitted by using a reliable UDP protocol. The TCP protocol is a reliable communication protocol based on fairness, but under some harsh network conditions, the TCP protocol has problems of failure to provide normal communication quality guarantee or excessive cost. The cost can be reduced as much as possible by using the reliable UDP protocol under the condition of ensuring the time delay and the quality of communication. The limitations of TCP protocol fairness make it difficult to achieve high speed and low latency for network performance. In the audio and video plug flow, the bandwidth needs to be squeezed as much as possible, and disconnection and reconnection of similar TCP protocols are avoided. In addition, the three-way handshake and four-way waving process in the TCP protocol can reduce the system resource occupancy rate and response time.

The key to implementing the reliabilization by the reliabilization UDP protocol is retransmission. As shown in fig. 9, the UDP protocol is divided into a sending end and a receiving end, and each UDP protocol is designed to be differently selected and simplified. The retransmission of the reliable UDP protocol is that the transmitting end retransmits data by the packet loss information feedback of the receiving end ACK. The sending end can design own retransmission modes according to scenes, and the retransmission modes are divided into three types: timing retransmission, request retransmission, and FEC (Forward Error Correction) selective retransmission.

The synchronized information is mainly remote guidance data added by the helper user on the AR image generated by the helper, as shown in fig. 10.

S508: and the assisting end equipment generates an AR image of the object to be assisted according to the source image, the pose estimation result and the remote guidance data.

Optionally, the assisting end device generates a virtual image according to the pose estimation result, and then generates an AR image of the object to be assisted according to the source image, the virtual image and the remote guidance data.

The generated AR image of the object to be assisted can display remote guidance data, so that the assisted user can see the object to be assisted and the object to be assisted through the assisted device, and can be intuitively and effectively guided.

The application provides an augmented reality remote guidance method, wherein a assisted end device acquires a source image, encodes the source image into a video stream, and pushes the video stream to a server. And based on the source image, the assisted end equipment carries out pose estimation on the object to be assisted and synchronizes a pose estimation result to the server. And the server receives the video stream for data caching, and synchronizes the pose estimation result to the assistant end equipment. The assistant end equipment decodes the video stream into an image, and a virtual space is also locally generated at the assistant end according to a pose estimation result sent by the assistant end equipment to form an AR image. The assisting end user can label the AR image, the remote guidance data is added, and the assisting end equipment can be synchronized to the assisted end equipment after receiving the remote guidance data. And the assisted end equipment generates a virtual image according to the pose estimation result and then generates an AR image according to the source image, the virtual image and the remote guidance data. The AR image of the assisting end can display the remote guidance data and the information of the object to be assisted, so that the assisted end user can be intuitively and effectively guided, and the task processing efficiency is improved.

Fig. 11 is a schematic view of an augmented reality remote guidance device provided in an embodiment of the present application, as shown in the figure, an augmented reality remote guidance device 1100 provided in this embodiment may include: the system comprises an acquisition module 1101, a sending module 1102, a pose estimation module 1103, a receiving module 1104 and a generating module 1105.

The acquiring module 1101 is configured to acquire a source image of an object to be assisted.

A sending module 1102, configured to send the source image to the assistant end device.

And a pose estimation module 1103, configured to perform pose estimation on the source image.

The sending module 1102 is further configured to send the pose estimation result to the assisting end device.

A receiving module 1104, configured to receive the remote guidance data sent by the helper device

A generating module 1105, configured to generate an AR image of the object to be assisted according to the pose estimation result and the remote guidance data.

The apparatus of this embodiment may be used to implement the method embodiment shown in fig. 2, and the implementation principle and technical effect are similar, which are not described herein again.

Fig. 12 is a schematic view of an augmented reality remote guidance device according to an embodiment of the present application, as shown in the drawing, an augmented reality remote guidance device 1200 according to the embodiment may include: a receiving module 1201, a generating module 1202 and a sending module 1203.

The receiving module 1201 is configured to receive a source image and a pose estimation result of an object to be assisted, which are sent by the assisted end device.

And the generating module 1202 is configured to generate an AR image of the object to be assisted according to the source image and the pose estimation result.

The receiving module 1201 is further configured to receive remote guidance data of the AR image from the user.

A sending module 1203, configured to send the remote guidance data to the assisted end device.

The apparatus of this embodiment may be used to implement the method embodiment shown in fig. 3, and the implementation principle and technical effect are similar, which are not described herein again.

Fig. 13 is a third schematic view of an augmented reality remote guidance device provided in an embodiment of the present application. As shown in fig. 13, an embodiment of the present application provides an augmented reality remote guidance apparatus 1300 including: the processor 1301 and the memory 1302 are connected by a bus 1303.

In a specific implementation process, the code is stored in the memory 1302, and the processor 1301 runs the code stored in the memory 1302 to execute the augmented reality remote guidance method of the above method embodiment.

For a specific implementation process of the processor 1301, reference may be made to the above method embodiments, which have similar implementation principles and technical effects, and details are not described herein again.

In the embodiment shown in fig. 13, it is understood that the Processor 1301 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory 1302 may include a random-access memory (RAM) and may also include a non-volatile memory (NVM), such as at least one disk memory. The memory 1302 may store instructions for performing various processing functions and implementing the method steps of the present application.

The bus 1303 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus 1303 may be divided into an address bus, a data bus, a control bus, and the like. For convenience of illustration, the bus 1303 in the figures of the present application is not limited to only one bus or one type of bus.

Fig. 14 is a fourth schematic view of an augmented reality remote guidance device provided in an embodiment of the present application. As shown in fig. 14, an embodiment of the present application provides an augmented reality remote guidance apparatus 1400, including: a processor 1401 and a memory 1402, wherein the processor 1401 and the memory 1402 are connected via a bus 1403.

The apparatus of this embodiment may be used to implement the method embodiment shown in fig. 4, and the implementation principle and technical effect are similar, which are not described herein again.

The embodiment of the application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-executable instructions are used for implementing the augmented reality remote guidance method of the above method embodiment.

The computer-readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.

The present application provides a computer program product comprising a computer program which, when executed by a processor, can implement the augmented reality remote guidance method provided in any of the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An augmented reality remote guidance method, comprising:

receiving remote guidance data sent by the helper device;

2. The method of claim 1, wherein generating an AR image from the source image, pose estimation results, and the remote guidance data comprises:

generating a virtual image according to the pose estimation result;

and generating an AR image of the object to be assisted according to the source image, the virtual image and the remote guidance data.

3. The method of claim 1, wherein the sending the source image to an assisting end device comprises:

and coding the source image and sending the coded source image data to the assistant end equipment.

4. An augmented reality remote guidance method, comprising:

generating an AR image of the object to be assisted according to the source image and the pose estimation result;

5. The method according to claim 4, wherein after receiving the source image and the pose estimation result of the object to be assisted sent by the assisted end device, the method further comprises:

decoding the source image.

6. The method according to claim 4, wherein the generating an AR image of the object to be assisted according to the source image and the pose estimation result comprises:

generating a virtual image according to the pose estimation result

And generating the AR image of the object to be assisted according to the source image and the virtual image.

7. An augmented reality remote guidance apparatus, comprising: a processor, a memory having code stored therein, the processor executing the code stored in the memory to perform the augmented reality remote guidance method of claims 1-3.

8. An augmented reality remote guidance apparatus, comprising: a processor, a memory having code stored therein, the processor executing the code stored in the memory to perform the augmented reality remote guidance method of claims 4-6.

9. A computer-readable storage medium having stored thereon computer-executable instructions for implementing the augmented reality remote guidance method of any one of claims 1 to 3 when executed by a processor.

10. A computer-readable storage medium having stored thereon computer-executable instructions for implementing the augmented reality remote guidance method of any one of claims 4 to 6 when executed by a processor.