CN108629800A

CN108629800A - Plane determines that method and augmented reality show the display methods of information, related device

Info

Publication number: CN108629800A
Application number: CN201710853701.3A
Authority: CN
Inventors: 罗振波; 王树; 朱翔宇; 姜映映
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2017-03-20
Filing date: 2017-09-15
Publication date: 2018-10-09

Abstract

The present invention provides a kind of planes to determine that method and augmented reality show that the display methods of information, related device, this method include：Region segmentation and estimation of Depth are carried out to multimedia messages；Then according to region segmentation result and depth estimation result, the three-dimensional planar information of multimedia messages is determined, then according to the corresponding three-dimensional planar information of multimedia messages, display augmented reality shows information.Plane provided by the invention determines that method and augmented reality show that the display methods of information, related device can be used in virtual display information being added in three-dimensional planar, can improve enhancing display and show the authenticity of result, and can promote the experience of user.

Description

Plane determination method, augmented reality display information display method and corresponding device

Technical Field

The invention relates to the technical field of multimedia, in particular to a plane determining method, a display method of augmented reality display information and a corresponding device.

Background

Augmented Reality (AR) technology can superimpose virtual content onto a real scene, so that a user can obtain more than real sensory experience, that is, the user can perceive a scene in which a real object and virtual content exist simultaneously. The AR technology can be applied to the fields of home furnishing, travel translation, shopping, games, navigation, education and the like.

In the AR implementation method in the prior art, the virtual content is usually directly placed in the multimedia information corresponding to the real scene to obtain the AR display result, but the reality of the AR display result obtained according to the existing method is poor, and the user experience needs to be improved. As shown in fig. 1, after a virtual object (such as an animal) is placed in multimedia information corresponding to a real scene, the virtual object is suspended in the air in the AR display result, which is not in accordance with the real situation; as shown in fig. 2, after being placed in the multimedia information corresponding to the real scene, the virtual object is attached to an unreasonable plane (such as a vertical wall); as shown in fig. 3, the displayed virtual navigation route extends directly into the air; as shown in fig. 4, the virtual navigation route is displayed directly across the obstacle.

In summary, the AR implementation method in the prior art will result in unreal display results of the AR and poor user experience.

Disclosure of Invention

In order to overcome the above technical problems or at least partially solve the above technical problems, the following technical solutions are proposed:

the embodiment of the invention provides a plane determination method, which comprises the following steps:

carrying out region segmentation and depth estimation on the multimedia information;

and determining the three-dimensional plane information of the multimedia information according to the region segmentation result and the depth estimation result.

An embodiment of the present invention provides a plane determining apparatus, including:

the processing module is used for carrying out region segmentation and depth estimation on the multimedia information;

and the first determining module is used for determining the three-dimensional plane information of the multimedia information according to the region segmentation result and the depth estimation result which are obtained by the processing module.

In an embodiment of the present invention, a method for displaying augmented reality display information is provided, including:

determining three-dimensional plane information corresponding to the multimedia information;

and displaying the augmented reality display information according to the three-dimensional plane information corresponding to the multimedia information.

An embodiment of the present invention provides a display device for displaying information in augmented reality, including:

the second determining module is used for determining three-dimensional plane information corresponding to the multimedia information;

and the display module is used for displaying the augmented reality display information according to the three-dimensional plane information corresponding to the multimedia information determined by the second determination module.

The invention provides a plane determining method, a display method of augmented reality display information and a corresponding device.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a diagram illustrating a prior art augmented reality display of information display results;

FIG. 2 is a diagram illustrating an augmented reality display information display result according to another prior art;

FIG. 3 is a diagram illustrating an augmented reality display information display result according to yet another prior art;

FIG. 4 is a diagram illustrating an augmented reality display information display result according to yet another prior art;

fig. 5 is a schematic flowchart of a plane determination method according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a segmentation result combining semantic annotation and plane annotation according to an embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating a FCN and conditional random field based segmentation process according to an embodiment of the present invention;

FIG. 8 is a block diagram illustrating an integrated area segmentation framework for obtaining plane information and semantic information simultaneously according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating a split network architecture according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating a depth estimation network architecture according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating a comparison between a prediction result during training and a true depth of a training sample according to an embodiment of the present invention;

FIG. 12 is a diagram of an overall framework of a depth estimation network according to an embodiment of the present invention;

FIG. 13 is a diagram illustrating region segmentation and depth estimation in a multitasking manner according to an embodiment of the present invention;

FIG. 14 is a diagram illustrating region segmentation and depth estimation in a multitasking manner according to an embodiment of the present invention;

FIG. 15 is a comparison graph of the advantages of using three-dimensional spatial information compared to two-dimensional spatial information in planar understanding, according to an embodiment of the present invention;

FIG. 16 is a schematic diagram illustrating adjustment of the determined three-dimensional plane information according to an embodiment of the present invention;

FIG. 17 is a flow chart illustrating a method for blending defined planes in an embodiment of the present invention;

FIG. 18 is a diagram illustrating processing results of a method for blending determined planes in an embodiment of the present invention;

fig. 19 is a flowchart illustrating a method for displaying augmented reality display information according to an embodiment of the present invention;

FIG. 20 is a diagram illustrating an exemplary display position for automatically recommending virtual display information according to an embodiment of the present invention;

FIG. 21 is a schematic diagram of an embodiment of the present invention, in which automatic recommendation is implemented by a priori knowledge-based filtering system and a long-short memory neural network;

FIG. 22 is a comparison of augmented reality display information generated by the prior art and the present invention;

FIG. 23 is a flowchart illustrating a specific process of automatically recommending a display position of virtual display information according to an embodiment of the present invention;

FIG. 24 is a diagram illustrating an exemplary display position for automatically recommending virtual display information according to an embodiment of the present invention;

FIG. 25 is a flowchart illustrating a method for adjusting and recommending a plane position according to an embodiment of the present invention;

FIG. 26 is a diagram illustrating a first preferred method of planar position adjustment according to an embodiment of the present invention;

FIG. 27 is a diagram illustrating a second preferred method of planar position adjustment according to an embodiment of the present invention;

FIG. 28 is a schematic flow chart illustrating a method for adjusting and recommending the attitude of the collection device according to the embodiment of the present invention;

FIG. 29 is a schematic diagram illustrating a method for adjusting and recommending the attitude of the collection device according to the embodiment of the present invention;

FIG. 30 is a schematic flow chart illustrating a process of displaying a driving assistance guidance message according to an embodiment of the present invention;

FIG. 31 is a schematic diagram illustrating a driving assistance notification message according to an embodiment of the present invention;

FIG. 32 is a schematic diagram illustrating a method for estimating a road surface condition in a driving system according to an embodiment of the present invention;

FIG. 32 is a flowchart illustrating a method for estimating a road surface condition in an assistant driving system according to an embodiment of the present invention;

FIG. 33 is a schematic diagram illustrating a method for estimating a road surface condition in a driving assistance system according to an embodiment of the present invention;

FIG. 34 is a flowchart of a method for implementing an AR keyboard according to an embodiment of the present invention;

FIG. 35 is a diagram illustrating a multilingual AR keyboard, in accordance with an embodiment of the present invention;

FIG. 36 is a diagram of an encryption keypad in accordance with an embodiment of the present invention;

FIG. 37 is a schematic view of an apparatus for plane determination according to an embodiment of the present invention;

fig. 38 is a schematic structural diagram of a display device for displaying augmented reality in an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

Example one

An embodiment of the present invention provides a method for determining a plane, as shown in fig. 5, including the following steps:

step 501, performing region segmentation and depth estimation on the multimedia information.

The multimedia information in the embodiment of the invention comprises images and/or videos and the like.

In addition, the multimedia information in the embodiment of the present invention may be, but is not limited to, monocular multimedia information, such as multimedia information collected by a multimedia information collecting device (e.g., a camera).

The embodiment of the invention provides that the region segmentation result can contain two-dimensional plane information. The multimedia information can be segmented into regions through a deep learning network, but not limited to, to obtain two-dimensional plane information. Specifically, the method includes the following step 5011 (not shown).

In step 5011, the multimedia information is subjected to region segmentation through a deep learning network obtained by combining with plane labeling training, and a region segmentation result (two-dimensional plane information) is obtained.

Wherein, the deep learning network is obtained by training in the following way: carrying out plane marking on the training sample; and training according to the marked training sample to obtain a deep learning network.

The embodiment of the invention provides that the region segmentation result can further comprise semantic information corresponding to the two-dimensional plane. The multimedia information may be segmented by a deep learning network to obtain two-dimensional plane information and semantic information corresponding to the two-dimensional plane. The two-dimensional plane recognition and the semantic recognition can be carried out through different deep learning networks, and the two-dimensional plane information and the semantic information corresponding to the two-dimensional plane are obtained respectively.

In addition, the multimedia information may also be subjected to region segmentation through the same deep learning network, that is, the two-dimensional plane recognition and the semantic recognition are simultaneously performed through the same deep learning network, and the two-dimensional plane information and the semantic information corresponding to the two-dimensional plane are simultaneously obtained, specifically including the following step 5012 (not labeled in the figure).

Step 5012, performing region segmentation on the multimedia information through a deep learning network obtained by combining semantic annotation and plane annotation training to obtain a region segmentation result (two-dimensional plane information and semantic information corresponding to the two-dimensional plane).

Wherein, the deep learning network is obtained by training in the following way: carrying out semantic annotation and plane annotation on the training samples; and training according to the marked training sample to obtain a deep learning network.

In the above-mentioned step 5011 and step 5012, when the deep learning network is obtained according to the training of the labeled training sample, the objective function and the network structure of the deep learning network may be determined first; and then training the deep learning network according to the labeled training sample, the target function and the network structure.

For the embodiment of the present invention, in the above step 5011 and step 5012, when the deep learning network is trained, pixel-level labeling, such as semantic labeling (overall English: semantic labeling) or plane labeling, may be performed on the training samples. The semantic annotation takes a semantic object as a unit, the semantic object is annotated at a specific position in multimedia information (hereinafter, the multimedia information is explained by taking an image as an example), and the pixel-level semantic annotation is to make the specific position accurate to the pixel level and perform semantic annotation on each pixel point in the image serving as a training sample. For example, a vehicle is selected as a semantic object, and each pixel point of the vehicle in the image is labeled with the same semantic information (e.g., semantic attribute identifier). In the embodiment of the invention, the plane marking takes a plane as a unit, and pixel points corresponding to each plane in an image are marked as the same plane information. For example, an image contains N planes, each plane corresponds to a plurality of pixel points, and each pixel point in the same plane corresponds to the same plane information.

Because semantic labeling and plane labeling are already carried out on each pixel point when the deep learning network is trained, when the multimedia information is subjected to region segmentation through the deep learning network, a plane recognition result and a semantic recognition result of each pixel point can be obtained, and then two-dimensional plane information and semantic information corresponding to a two-dimensional plane can be determined according to the plane recognition result and the semantic recognition result of each pixel point.

The same deep learning network trained in the above manner may also be referred to as an End-to-End (End-to-End) deep learning network trained in the above manner

When the pixel points are subjected to plane marking, the plane information corresponding to the pixel points comprises a classification identifier and/or a plane identifier, and the classification identifier comprises at least one of the following items: the classification mark corresponding to the plane, the classification mark corresponding to the plane edge and the classification mark corresponding to the non-plane. When a pixel point belongs to a certain plane, the plane information of the pixel point comprises a classification identifier corresponding to the plane and a plane identifier of the plane to which the pixel point belongs; when the pixel does not belong to any plane, the plane information of the pixel comprises a classification identifier corresponding to the non-plane; when the pixel point is located at the edge of the plane, the plane information of the pixel point includes the classification identifier corresponding to the edge of the plane and the plane identifier of the plane corresponding to the edge of the plane.

Unlike simple region segmentation, the region segmentation in the embodiment of the present invention can be defined as a plane classification problem, rather than the segmentation problem in the prior art. Because the classification identification is marked on each pixel point when the deep learning network is trained, each pixel point can be classified by utilizing the deep learning network obtained by training, and the points with the final classification result of 'plane' can form a communicated two-dimensional plane, thereby obtaining the two-dimensional plane information.

In the embodiment of the invention, by combining the semantic annotation and the plane annotation, the training sample can be endowed with double attributes, and each pixel point in the training sample corresponds to the semantic annotation and the plane annotation. For example, as shown in fig. 6, fig. 6(a) corresponds to a general region segmentation method, which can only segment planes in an image, but cannot determine semantic information corresponding to each plane; FIG. 6(b) shows a general semantic segmentation method, which can only distinguish the image content into a cube and a background, but cannot distinguish the image content into each plane of the cube specifically; fig. 6(c) is a region segmentation method adopted in the embodiment of the present invention, which can obtain plane information and semantic information corresponding to a plane at the same time. Specifically, the method comprises the following steps: the method comprises the following steps that a hexahedral cube is formed, six faces correspond to the same semantic, and for any pixel point in the hexahedral cube, each pixel point is labeled into the same semantic attribute through semantic labeling; and by combining the plane labels, six planes of a cube can be further subdivided. In the embodiment of the invention, the dual attributes can help to refine the multimedia information, so that more meaningful results can be obtained compared with simple semantic segmentation or simple region segmentation.

In the following, the above-mentioned semantic segmentation architecture is described in detail, and the deep learning Network may be a full Convolutional neural Network (full name: full Convolutional neural Network, abbreviated as FCN). The framework of the whole set of segmentation procedures is based on FCN and Conditional Random Fields (CRF). As shown in fig. 7, an input image and a semantic segmentation target are determined, the semantic segmentation target in the image is defined as a character, a preliminary semantic segmentation result is obtained through intensive prediction of a full convolutional neural network, the preliminary semantic segmentation result corresponds to a part of the input image containing the character, and the FCN adopts a perforated structure (english name: atroussructure) in both a convolutional layer and a pyramid pooling layer, so as to achieve the purposes of reducing a downsampling degree and allowing multi-scale feature extraction, thereby enabling the semantic segmentation result to be more reliable and the network to be easier to train. Because the preliminary semantic segmentation result is partially adhered and blurred, a more refined final segmentation result can be obtained by using a conditional random field and setting image information such as illumination, color, contrast and the like as judgment conditions.

The integrated region segmentation framework (also called a multitask region segmentation framework) capable of obtaining the plane information and the semantic information simultaneously provided by the embodiment of the invention is based on the semantic segmentation framework, and the multitask region segmentation framework converts the acquisition of the plane information from the traditional image processing problem into the pixel level classification problem in deep learning, so that the plane information can be acquired through the semantic segmentation framework. As shown in fig. 8, an image is input first, and after passing through a feature extraction layer composed of a full convolution neural network, two pixel-level classifications are performed on each pixel, one is a planar classification, that is, which type the pixel belongs to in a plane, a planar edge, or a non-plane, and the other is a semantic classification, that is, which semantic attribute the pixel belongs to, such as sky, a wall, a floor, or the like. And finally obtaining a final region segmentation result simultaneously containing the two-dimensional plane information and the corresponding semantic information.

The embodiment of the invention provides a step of performing depth estimation on multimedia information, which can improve the precision of depth estimation by utilizing the difference information between frames before and after time in monocular multimedia information (multimedia information collected by a monocular camera). Specifically, only spatial correlation information can be obtained by performing depth estimation using single-frame multimedia information, but not only spatial correlation information but also temporal correlation information of each position in the multimedia information can be obtained by performing depth estimation using multimedia information of frames before and after time, and the temporal correlation information can be used to calibrate a depth estimation result. For example, when the depth estimation results of the previous and subsequent frames at the same position (e.g., the same pixel point) are too different (greater than a set threshold), the depth estimation result of the current frame may be calibrated by using the depth estimation result of the previous frame.

The embodiment of the invention provides that the region segmentation and the depth estimation can be carried out in a single task mode, and specifically comprises the following steps: and obtaining a region segmentation result through a depth learning network (which can be called as a segmentation network) corresponding to the region segmentation, obtaining a depth estimation result through a depth learning network (which can be called as a depth estimation network) corresponding to the depth estimation, and further performing three-dimensional plane fitting according to the region segmentation result and the depth estimation result which are respectively obtained to obtain three-dimensional plane information.

For the embodiment of the present invention, the segmentation network and the depth estimation network are shown in fig. 9 and fig. 10, specifically, referring to fig. 9, for the segmentation network, firstly, an image is input, feature information is extracted by a plurality of feature extraction layers, segmentation results are classified by SOFTXMAX classification layers, and the segmentation results are restored to the original size by a layer of deconvolution method, so as to obtain a pixel-level segmentation result.

Referring to fig. 10, the first half of the depth estimation network is similar to the segmentation network, a multi-layer residual network is used as a feature extraction layer to perform feature extraction, and then the feature extraction layer gradually recovers to the size of half of the original image through a series of deconvolution layers, and the final result can be continuously distributed in a thermal diagram manner.

When the depth estimation network is trained, a triple loss function (english full name: triple loss) composed of an absolute loss function, a relative loss function and a fusion loss function may be used for training according to the true depth of a training sample, and fig. 11 is a comparison diagram between a prediction result (a depth information map obtained by prediction) during training and the true depth of the training sample for an input image.

Specifically, the method comprises the following steps: for the embodiment of the present invention, the overall framework of the depth estimation Network may be based on FCN, as shown in fig. 12, where the depth estimation Network mainly consists of 50 layers of Residual error networks (full english: ResNet-50) in the first half of FCN and a deconvolution Network (full english: deconvolution Network) in the second half of FCN, and in the training process, the precision of depth estimation is improved by using a triplet loss function, and the training result is optimized. Firstly inputting image information, then extracting the features of the image by ResNet-50, wherein the feature extraction process utilizes jump connection to connect a plurality of layers in a residual error network together, and then deconvoluting the extracted features by a deconvolution network until the size of the extracted features is half of the size of an original image, wherein the jump connection is also used in the deconvolution process, then the output result is up-sampled to obtain depth estimation results corresponding to the original image pixels one by one, and finally the depth information corresponding to each pixel point in the original image pixels is obtained. Wherein the depth estimation results may present a continuous distribution in the form of a thermal map.

The embodiment of the invention provides that after the region segmentation result and the depth estimation result are obtained, the depth estimation result can be corrected further according to the region segmentation result; and/or correcting the region segmentation result according to the depth estimation result.

The region segmentation result and the depth estimation result can be obtained in a single task mode.

In addition, the embodiment of the present invention further provides that a multitasking method may be adopted to perform region segmentation and depth estimation, the same depth learning network is used to perform region segmentation and depth estimation on the multimedia information, the depth estimation result is corrected according to the region segmentation result, and the region segmentation result is corrected according to the depth estimation result to obtain the corrected region segmentation result and the corrected depth estimation result.

Specifically, the method comprises the following steps: referring to fig. 13, a feature extraction layer performs feature extraction on an input image (multimedia information) to obtain multi-level feature information, and since the subsequent depth estimation subnetwork and region segmentation subnetwork only need one feature extraction and do not need to perform feature extraction respectively, shared computation of the depth estimation subnetwork and the region segmentation subnetwork is realized; according to the extracted multi-level feature information, obtaining a region segmentation result (the result is a preliminary result) corresponding to the region segmentation sub-network through a region segmentation sub-network, and according to the multi-level feature information, obtaining a depth estimation result (the result is a preliminary result) corresponding to the depth estimation sub-network through a depth estimation sub-network; and fusing the region segmentation result corresponding to the region segmentation sub-network and the depth estimation result corresponding to the depth estimation sub-network through the fusion layer (correcting the depth estimation result according to the region segmentation result and correcting the region segmentation result according to the depth estimation result) to obtain a corrected region segmentation result and a corrected depth estimation result.

The two sub-networks (the area segmentation sub-network and the depth estimation sub-network) may be logical sub-networks, and in an actual network architecture, the two sub-networks may be integrated, i.e., one network; or as two separate networks.

Further, the depth estimation sub-network and the region segmentation sub-network are trained by: training a depth estimation sub-network by taking a deep learning network (which can be but is not limited to a residual error network) as a pre-training model; training the region segmentation sub-network by taking the trained depth estimation sub-network as a pre-training model; training a fusion layer of the deep learning network by taking the trained region segmentation sub-network as a pre-training model; and training the depth estimation sub-network and the region segmentation sub-network by taking the fusion layer of the trained deep learning network as a pre-training model.

The depth estimation sub-network and the region segmentation sub-network can be trained by the following method:

training a region segmentation sub-network by taking a deep learning network (which can be but is not limited to a residual error network) as a pre-training model; training a depth estimation sub-network by taking the trained region segmentation sub-network as a pre-training model; training a fusion layer of the deep learning network by taking the trained deep estimation sub-network as a pre-training model; and training the region segmentation sub-network and the depth estimation sub-network by taking the fusion layer of the trained deep learning network as a pre-training model.

Preferably, when training the region segmentation sub-network, the region segmentation sub-network may also be trained by combining with the plane label, or the plane identifier and the semantic label in the above mentioned manner.

For example, a residual network is used as a pre-training model, the learning rate of a region segmentation sub-network is set to 0, only the depth estimation sub-network is subjected to back propagation, network parameters of the depth estimation sub-network are optimized according to unilateral training, namely the initial learning rate of the depth estimation sub-network is set to 0.01, training is performed for about 100000 rounds, then the depth estimation sub-network obtained through the previous step is used as the pre-training model, the learning rate of the depth estimation sub-network is set to 0, the network parameters of the segmentation sub-network are optimized unilaterally, the learning rate and the training times are the same, then the result of the previous step is the pre-training model, the learning rates of the segmentation sub-network and the depth estimation sub-network are both set to 0, parameter information of the last layer of the fusion layer is trained, then the previous step is the pre-training model, all the, train 100000 rounds to get the final result.

The following introduces the advantages of being able to perform depth estimation and region segmentation by means of multitasking:

1. sharing the computing aspect: in the single-task processing mode, the depth estimation network and the region segmentation network are both based on the full convolution neural network intensive prediction, the network structure similarity of the depth estimation network and the region segmentation network is extremely high, and only the learning target has a significant difference. The embodiment of the invention utilizes the characteristic of extremely high similarity of the network structure in the single task processing mode, and uses the multi-task processing mode to enable the depth estimation and the region segmentation to share the calculation, thereby improving the operation speed and the precision of the processing result.

2. Dependence aspect: the depth estimation result and the region segmentation result are complementary and mutually constrained. For example, if a region has no obvious change in depth information, the region is likely to be a plane, whereas if a region is a plane, the depth information may need to maintain a continuous smooth change. By integrating the two points, the region segmentation and the depth estimation are unified by the same deep learning network to carry out multi-task prediction, so that a depth estimation result and a region segmentation result are obtained simultaneously, the calculation can be shared, the speed is improved, the constraint can be complemented, and the reliability of a final result can be improved.

If the result of the region segmentation of the sub-network of region segmentation includes the two-dimensional plane information and the semantic information corresponding to the two-dimensional plane, when the result of the depth estimation is corrected according to the result of the region segmentation, the result of the depth estimation can be corrected according to the two-dimensional plane information and the semantic information, and a more accurate result of the depth estimation can be obtained, for example, when the image includes a window region and the window region is subjected to the depth estimation, the result of the depth estimation obtained may not be the depth of the window region but the depth of an object outside the window due to the transparent glass, if the result of the depth estimation can be corrected according to the semantic information of the region, a more accurate result of the depth estimation can be obtained, and for example, for a wall surface and a picture hung on the wall surface, the difference between the depth information of the wall surface and the semantic information of the wall surface can be determined by the semantic information, so, the accuracy of the depth estimation result is improved.

As shown in fig. 14, an image is input, and a full convolution network is used as a feature extraction layer to extract and obtain multi-level feature information; obtaining a region segmentation result (a primary result) corresponding to the region segmentation sub-network through the region segmentation sub-network according to the extracted multi-level feature information, wherein the region segmentation result comprises two-dimensional plane information and corresponding semantic information, and obtaining a depth estimation result (the primary result) corresponding to the depth estimation sub-network through the depth estimation sub-network according to the multi-level feature information; according to the depth estimation result, the region segmentation result can be corrected through the cross-domain conditional random field, and according to the region segmentation result (two-dimensional plane information and corresponding semantic information), the depth estimation result can be corrected through the cross-domain conditional random field, so that the corrected region segmentation result and the corrected depth estimation result are obtained, and a more accurate result is obtained.

For the embodiment of the invention, the depth information and the region segmentation result can be simultaneously predicted by a novel network structure during shared computation. Specifically, image information is input, higher-level features are extracted layer by layer through a depth residual error network (Deep full name), the features of multiple levels are fused in modes of addition and the like to obtain feature map information containing multi-level information, then the multi-level feature map information is used for shared calculation of depth estimation and region segmentation at the same time, then respective depth information and plane information (which can also contain semantic information) are learned through a depth estimation sub-network and a segmentation sub-network respectively, and finally the depth information and the plane information are fused together through a layer of network, and a depth result and a region segmentation result are predicted at the same time, so that a multi-task learning/prediction effect is achieved.

And 502, determining three-dimensional plane information of the multimedia information according to the region segmentation result and the depth estimation result.

When the three-dimensional plane information is determined, three-dimensional plane fitting can be performed according to the region segmentation result and the depth estimation result, so that the three-dimensional plane information of the multimedia information is obtained.

The embodiment of the present invention provides that, after step 502, the method may further include: and adjusting the determined three-dimensional plane information according to the semantic information and the spatial relationship information corresponding to the determined three-dimensional plane information.

Specifically, according to the semantic information and the spatial relationship information corresponding to the determined three-dimensional plane information, the incidence relationship between the three-dimensional planes is determined, and the determined three-dimensional plane information is adjusted through the determined incidence relationship to correct the wrong three-dimensional plane information.

The association relationship utilizes spatial relationship information and semantic information of a three-dimensional plane, and is different from the prior art that only two-dimensional plane information is utilized, the spatial relationship of the embodiment of the invention is expanded to three-dimensional, and a depth estimation result obtained by estimation in step 501 is used. Compared with the spatial relationship of two-dimensional planes, the spatial relationship of three-dimensional planes can reflect the exact position between the planes more truly, as shown in fig. 15(a), which is a schematic diagram of a two-dimensional spatial relationship in which a plane a is located above a plane B; fig. 15(B) is a schematic diagram of a three-dimensional spatial relationship, in which a plane a is perpendicular to a plane B in an actual three-dimensional spatial relationship.

The association relationship between the three-dimensional planes can be realized by a conditional random field, specifically, each three-dimensional plane is used as a fixed point, all the fixed points are connected into a directed graph, the conditional random field is used as a basic frame, the conditional relationship is set to be a three-dimensional spatial relationship and a semantic relationship, and the plane information and the semantic information of each plane are corrected. As shown in fig. 16, a left wall surface is misclassified as a floor before correction, and the periphery of the plane is detected as walls by the conditional random field, so that it is inferred that the plane information is likely to have a mistake, and the semantic information of the plane is corrected to be the walls correctly by the correction.

The correlation between the three-dimensional planes may be implemented in other manners, for example, by a markov field.

The embodiment of the invention provides a method for determining a plane in a mixed mode, and when the plane in multimedia information is determined, a proper plane determining method can be selected in a self-adaptive mode according to texture information of each area. The existing instant positioning And map construction (SLAM) plane determining method has a good determining effect on areas with rich textures, but cannot obtain accurate determining results on areas with missing texture information (with little or no texture information), such as smooth desktops, glass surfaces And wall surfaces, And is not suitable for an SLAM plane determining mode due to little or no texture information. The plane determining method based on the deep learning mode can further determine the information such as the normal vector, the orientation and the like of the plane according to the obtained three-dimensional plane information for the texture information missing area, and the information is very beneficial to subsequent rendering of virtual display information, so that the generated AR display information is more real, and the user experience is improved.

Specifically, the method comprises the following steps: before performing the region segmentation and the depth estimation on the multimedia information, the method may further include: determining texture information of the multimedia information; determining a texture missing region according to the texture information;

and subsequently, aiming at the determined texture missing region, performing region segmentation and depth estimation according to the mode provided by the embodiment of the invention.

When determining the texture information of the multimedia information, the number of feature points in an area (the area can be determined according to user operation) in the multimedia information can be determined first, whether the texture information of the area is rich or not is judged, a threshold value T is set, when the number of the feature points exceeds the threshold value T, the area is judged to be a texture-rich area, and otherwise, the area is judged to be a texture-missing area.

Fig. 17 is a flow chart illustrating a method for determining a plane for mixing. The method comprises the steps of judging whether texture information is rich or not according to an input image, selecting an SLAM plane determining method to determine a plane in the image when the texture information is rich, selecting the plane determining method based on the depth learning network in the embodiment of the invention to perform region segmentation and depth estimation when the texture information is too little or no texture information, performing three-dimensional plane fitting according to a region segmentation result and a depth estimation result, and finally obtaining three-dimensional plane information.

The method for determining the plane by mixing may be performed for multimedia information, or may be performed for some areas, for example, the method for determining the plane by mixing may be performed for an area where a user wants to place virtual display information (an area where the virtual display information needs to be rendered).

Fig. 18 shows the processing result of the above method for determining a plane by blending, where bright spots in the map represent texture feature points, and the denser the texture feature points are, the more abundant the texture information of the area is. Two blocks in the figure are virtual display information. The user selects to place the square on the desktop without texture information, and according to the steps in fig. 17, determines that the area has no texture information, and should select a plane determination method based on a deep learning network instead of the conventional SLAM plane determination method. By the plane determination method based on the deep learning network, information such as normal vectors, orientation, size and the like of the desktop can be obtained, rendered AR display information can be more real according to the information and the virtual display information, and user experience is improved.

The method for determining the plane in a mixed mode enhances the applicability under different scenes, the SLAM plane determining method is applied to the area with rich texture, and the plane determining method based on the deep learning (area segmentation, depth estimation and plane fitting) is applied to the area with missing texture to make up for the defects of the SLAM plane determining method. The method for determining the plane by mixing enables the plane under any scene to be estimated, overcomes the defects of the traditional method, and can greatly increase the application range by virtue of the advantages.

The embodiment of the invention determines the three-dimensional plane of the multimedia information by carrying out region segmentation and depth estimation on the multimedia information, compared with the two-dimensional plane determination method in the prior art, the determined three-dimensional plane is more accurate, and the augmented reality display information is displayed according to the determined three-dimensional plane information subsequently, so that the authenticity of an augmented reality display result can be improved, and the experience degree of a user can be further improved.

Example two

An embodiment of the present invention provides a method for displaying augmented reality display information, as shown in fig. 19, the method includes the following steps:

1001, determining three-dimensional plane information corresponding to multimedia information; and step 1002, displaying augmented reality display information according to the three-dimensional plane information corresponding to the multimedia information.

The embodiment of the present invention provides that, but not limited to, the three-dimensional plane information corresponding to the multimedia information may be determined according to the plane determining method in the first embodiment.

Further, step 1002 includes step 10021 (not labeled in the figure) and step 10022 (not labeled in the figure), where step 10021 obtains attribute information corresponding to the three-dimensional plane information and/or attribute information corresponding to the virtual display information; step 10022 is to display the augmented reality display information according to the attribute information corresponding to the acquired three-dimensional plane information and/or the attribute information corresponding to the virtual display information.

Wherein, the attribute information corresponding to the three-dimensional plane comprises: at least one item of semantic information corresponding to the three-dimensional plane, associated attribute information corresponding to the semantic information, and physical attribute information of the three-dimensional plane.

In the embodiment of the invention, the attribute information corresponding to the three-dimensional plane information and the attribute information corresponding to the virtual display information can be simultaneously obtained, so that the augmented reality display information is generated, or only the attribute information of the virtual display information can be obtained to generate the augmented reality display information. For example, if the virtual display information is determined to be an animal capable of flying according to the attribute information of the virtual display information, the virtual display information may be displayed at any position, that is, only the augmented reality display information corresponding to the virtual display information may be determined, and the three-dimensional plane information corresponding to the multimedia information may not be determined.

The associated attribute information is related to the semantic information and extends from the semantic information, and the semantic information can contain the associated attribute information. For example, if the semantic information is "sea surface," then "swimble" is the associated attribute information for "sea surface.

Further, the physical property information may include at least one of: area, color, contrast, texture, etc.

Further, the step of obtaining semantic information corresponding to the three-dimensional plane information includes any one of step a (not labeled in the figure), step B (not labeled in the figure), and step C (not labeled in the figure):

step A, semantic information corresponding to two-dimensional plane information of multimedia information is used as semantic information of corresponding three-dimensional plane information; b, determining semantic information of the three-dimensional plane information according to semantic information corresponding to the two-dimensional plane information of the multimedia information and a depth estimation result of the multimedia information; and step C, performing semantic analysis on the three-dimensional plane information to obtain semantic information corresponding to the three-dimensional plane.

The embodiment of the invention also provides that context semantic information of the three-dimensional plane information can be determined, and then the semantic information corresponding to the three-dimensional plane is adjusted through the context semantic information, so that the precision of the determined semantic information of the three-dimensional plane is improved.

When displaying the AR display information, how to display the virtual display information needs to be determined. Step 10022 comprises: determining the position relation between the virtual display information and the three-dimensional plane and/or the position relation between the virtual display information according to the attribute information corresponding to the acquired three-dimensional plane information and/or the attribute information corresponding to the virtual display information; and displaying the augmented reality display information according to the determined position relation.

A schematic diagram of a display position for automatically recommending virtual display information is shown in fig. 20, in fig. 20(a), three-dimensional plane information obtained from multimedia information includes a desktop and a ground, and virtual display information is a teacup, which can be placed on the desktop or on the ground according to attribute information of the teacup and the ground, but is not generally placed on a side wall of the desktop, and therefore, based on the above relationship, the teacup is recommended to be placed on the desktop; in fig. 20(b), when there are multiple pieces of virtual display information, automatic recommendation may also be performed for the relative position relationship between the multiple pieces of virtual display information, for example, the display and the keyboard are both virtual display information and are both placed on the desktop, the situation that the keyboard is placed behind the display is rare, and the situation that the keyboard is placed in front of the display is more reasonable, so that automatic recommendation is performed for placing the keyboard in front of the display.

The automatic recommendation method can be realized based on a priori knowledge Filtering system (English full name: knowledge based Filtering) and a Long and Short Memory neural Network (Long Short Term Memory Network, English Short Term: LSTM). As shown in fig. 21, multimedia information and virtual display information are determined, the multimedia information is divided into a plurality of areas according to a grid form, the virtual display information is tried to be placed at a certain position in the grid, a plurality of combination methods of the display information are randomly constructed, some combination methods which do not accord with rules are removed through a priori knowledge filter, finally, the remaining combination methods are scored through vertical LSTM and horizontal LSTM, and finally, the combination method with the highest score is the automatically recommended placement position.

Referring to fig. 22, the method according to the embodiment of the present invention performs region segmentation and depth estimation on multimedia information, performs three-dimensional plane fitting according to a region segmentation result and a depth estimation result to obtain three-dimensional plane information, and can determine that virtual display information (e.g., land animals) cannot fly in the air, walk on a wall, and only walk on the ground according to attribute information corresponding to a three-dimensional plane and/or attribute information corresponding to virtual display information, so that a plane belonging to the ground can be selected as a fusion object, and finally, virtual display information is selected to be placed on the ground with its front side facing up, instead of directly suspending virtual display information (e.g., land animals) in the air as in the prior art. Furthermore, the display volume and the specific position of the virtual display information in the multimedia information can be determined according to the actual size, the image distance and the like of the virtual display information, so that the situation that the virtual display information in the generated augmented reality display information is inconsistent with the real situation is avoided.

In the method for displaying augmented reality display information provided by the embodiment of the invention, the display mode (including position, size and the like) of the virtual display information is determined comprehensively according to the attribute information of the virtual display information and/or the attribute information of the three-dimensional plane information, so that the obtained augmented reality display information can be more fit with the real situation, and the user experience is improved.

The following describes a scheme for automatically recommending a display position of virtual display information, taking furniture placement in daily life of a user as an example.

The virtual display information may in this example specifically comprise furniture.

Through the scheme of the embodiment of the invention, the display positions of the furniture can be previewed and automatically recommended, and the display positions comprise the steps of previewing the furniture placing effect, deducing the reasonable position where the specific furniture can be placed, the best position where the furniture is placed and the like.

The arrangement of furniture is a basic living requirement of a user, and when the furniture needs to be purchased or the overall layout needs to be changed, the user needs to previously conceive how to place the furniture and place the furniture at a specified position. However, it is generally only when the layout is finished that the user can know whether the layout scheme of the furniture is reasonable or beautiful, thereby causing difficulty in achieving a satisfactory layout effect for the user and greatly increasing the cost of changing the layout, each time the layout is changed, the user is required to move the furniture to a specified position, which greatly increases the physical and mental efforts of the user. In addition, similar to trying on clothes, some users do not satisfy the actual arrangement effect of the furniture after buying the furniture or do not have a proper place for arrangement.

The embodiment can provide furniture layout preview, and a user can preview the layout effect before changing the furniture layout; when a user selects and purchases furniture, the furniture is arranged in advance, whether the furniture is suitable or not is judged, and whether a suitable place is arranged in a home or not is judged; the user can place according to the guidance or a reasonable or optimal position under the recommendation of the example.

A specific flow of automatically recommending the display position of the virtual display information is shown in fig. 23, wherein,

step 11, performing region segmentation and depth estimation on the input multimedia information (image), determining three-dimensional plane information corresponding to the multimedia information according to the region segmentation result and the region segmentation result, and screening out three-dimensional planes related to furniture placement, such as a wall plane, a floor plane and the like, according to semantic information of the three-dimensional planes.

The three-dimensional plane information corresponding to the multimedia information includes semantic information (walls, floors, etc.) and physical information (size, shape).

The present embodiment provides that, but not limited to, the three-dimensional plane information corresponding to the multimedia information is determined according to the plane determination method in the first embodiment.

And step 12, acquiring furniture information (including image information, size information and the like) to be placed, matching the three-dimensional plane information with the furniture information, and setting a certain screening rule, wherein the screening rule comprises a semantic matching rule (for example, a table can be placed on a floor plane but cannot be placed on a wall surface) and a size matching rule (for example, the plane size must be larger than the size of furniture), and the matched area is a reasonable area for placing furniture.

There are two ways to obtain the information of the furniture to be placed, the first way is to obtain the multimedia information containing the actual furniture information, determine the three-dimensional plane of the multimedia information, extract the plane corresponding to the furniture separately, and determine the related information (image and size, etc.); the second way is provided by the furniture vendor, directly providing the electronic information of the furniture, including images and dimensions.

The reasonable area can be displayed in a screen (the display method can be that the area is highlighted in different colors) so that a user can select a plane for placing furniture, and after the placed plane is determined, the placed plane and the furniture information can be fused together to achieve the preview effect.

And step 13, if the user selects a mode of automatically recommending the display position of the virtual display information, automatically recommending the optimal placement position in the reasonable area by combining the prior knowledge.

After the placed plane is determined, the placed plane and the furniture information can be fused together, and the preview effect is achieved. The furniture information is used as virtual display information and is displayed on the recommended placement position according to the position relation.

Referring to fig. 24, in order to illustrate a display position for automatically recommending virtual display information, a user wants to preview an effect of placing a tv table (virtual display information) in a living room in advance, and obtains multimedia information (picture) of the living room by shooting through a mobile phone, and the information of the tv table can be provided by a furniture provider. Through the above step 11, a plurality of planes in the room, including the ground, the window, the plane of the television display screen, the wall, and the like, can be determined by a geometric estimation method (region segmentation and depth estimation, and corresponding three-dimensional plane information is determined according to the region segmentation result and the region segmentation result). In these planes, through steps 11 and 12, a reasonable placement position can be found, for example, it is determined that the ground is a plane on which a table can be placed and is large enough to accommodate the table, i.e., the reasonable placement position; through the step 13, the fact that the television table is usually placed beside the television is determined by combining the priori knowledge, and the best placement position is recommended to be close to the ground of the television; the plane corresponding to the optimal placement position and the furniture information are fused together, and the preview effect is achieved.

And if the user selects to place according to the recommended placing position, selecting the recommended position as the placing position. The user can also select a custom placing option, and can manually select a certain plane as a final placing plane.

EXAMPLE III

In this embodiment of the present invention, the step 1002 further includes: determining a target plane in three-dimensional plane information corresponding to the multimedia information; determining adjustment information corresponding to a target plane; and displaying the augmented reality display information corresponding to the adjustment information.

The adjustment information may be, but is not limited to, adjustment direction information and/or adjustment angle information.

First example of the third embodiment

The embodiment provides a plane position adjustment recommendation method which comprises the steps of sensing a position relation and giving out adjustment information to enable the position of a plane to meet requirements.

Applicable scenarios are as follows: when a user places furniture, a clock, a picture frame and other articles, the articles to be placed need to be kept horizontal or vertical. For example, a tea table in a living room is parallel to a sofa, the overall indoor layout is arranged according to a strict vertical parallel relationship, and a clock and a picture frame are horizontally placed on a wall. However, because people have poor perceptibility to vertical and/or horizontal relationships and cannot distinguish small angle differences, especially when the people are at a close viewing angle, when articles such as a picture frame are placed, the people often observe the articles at a long distance after placing the articles, and then return to adjust the articles, or ask other people whether the articles are horizontal or vertical, and the adjustment back and forth many times can take extra energy and time of users, and may require extra helpers. The plane position adjustment recommendation method provided by the embodiment can assist a user to accurately determine the angle relation between planes, and makes up for the defect that the human cannot accurately distinguish small-angle differences. Through the accurate low-angle difference of distinguishing, can accurate level and/or the vertical relation between the plane of confirming, can put at the furniture, article are put, a plurality of aspects such as indoor overall arrangement facilitate, reduce the user and make a round trip to adjust kungfu and energy that the cost many times, promote the perfect degree of overall arrangement.

This example proposes, before determining the adjustment information corresponding to the target plane, further including: determining a reference plane and a position relation between a target plane and the reference plane in three-dimensional plane information corresponding to the multimedia information;

determining adjustment information corresponding to the target plane, including: and determining position adjustment information of the target plane according to the determined position relation, wherein the position adjustment information is used as adjustment information corresponding to the target plane.

Wherein the adjustment information is a position adjustment suggestion of the target plane. The target plane is a plane whose position needs to be adjusted. The positional relationship between the target plane and the reference plane may include the current positional relationship or may include the target positional relationship. The positional relationship between the object plane and the reference plane may be, but is not limited to, the angle between the edge lines of the planes.

In this example, the target plane and/or the reference plane and/or the target position relationship may be determined by user selection, for example, when the user hangs the frame on the wall surface, the user may select the frame as the target plane and the ceiling as the reference plane, and the target position relationship is that the frame of the frame is parallel to the edge of the ceiling. In addition, the user may determine the target plane and then automatically determine the reference plane or the target position relationship according to the target plane, for example, the target plane is a picture frame, and when a general user hangs the picture frame, the general user usually wants the picture frame to be parallel to the ground or the ceiling, so the ground or the ceiling plane can be automatically used as the reference plane, and the target position relationship is set to be that the frame of the picture frame is parallel to the edge of the ceiling.

Fig. 25 is a schematic flow chart of the plane position adjustment recommendation method in this example, in which,

and step 21, performing region segmentation and depth estimation on the input multimedia information (image), and determining three-dimensional plane information corresponding to the multimedia information according to the region segmentation result and the region segmentation result. And step 22, determining the current position relation between the target plane and the reference plane.

Wherein, the user can manually select a target plane and a reference plane in a three-dimensional plane, such as a canvas frame and a ceiling plane, which are the target plane and the reference plane respectively.

According to the three-dimensional plane information, the top frame of the oil painting frame plane and the edge line of the ceiling are obtained, the included angle between the top frame of the oil painting frame plane and the edge line of the ceiling is obtained, the included angle is used as the current position relation between the target plane and the reference plane, and can also be used as the plane angle between the target plane and the reference plane, and therefore the accurate position relation between the target plane and the reference plane can be obtained.

And step 23, determining adjustment information (a position adjustment suggestion of the target plane) according to the accurate current position relationship between the target plane and the reference plane and the target position relationship so as to assist the user in completing the position adjustment of the target plane.

Where the user has selected the target plane and the reference plane in step 22, the user may further select a desired target positional relationship between the target plane and the reference plane, such as where the frame of the frame is parallel to the ceiling edge. According to the current position relationship obtained in the step 22, the angle between the edge lines of the two planes is 3 degrees, and the selected target position relationship is 0 degree, so that the system automatically gives an adjustment suggestion, the target plane is rotated by 3 degrees, and the user adjusts the position of the target plane according to the adjustment suggestion, so that the edge lines of the two planes are parallel, and finally the picture frame can be horizontally hung on the wall surface.

Fig. 26 and 27 show a schematic plan position adjustment recommendation in this example.

Referring to fig. 26, in daily life, a user may need to horizontally place objects such as a picture frame on a wall surface, but the subjective perception capability of people on the vertical and horizontal relationship is poor, particularly the perception capability for a small angle (e.g., 5 °) is poor, and a general user needs to hang the picture frame on the wall, then remotely observe whether the picture frame is horizontal, then adjust the picture frame, may need to repeatedly adjust the picture frame for many times, and cannot ensure that the picture frame is finally horizontally placed.

In this scenario, a user acquires multimedia information (pictures or videos) through a multimedia acquisition device (e.g., a camera) of a terminal (e.g., a mobile phone, AR glasses, etc.).

Through step 21, according to the above method for determining planes, planes in the collected multimedia information are determined, so that the user can select a target plane and a reference plane, and the user can select the divided planes as the target plane and the reference plane by clicking the position of the plane in the screen. The user can respectively designate the picture frame as a target plane and the ceiling as a reference plane on a mobile phone display screen or an AR glasses display screen in a touch screen clicking mode;

after the two planes are selected, two options of horizontal and vertical can be provided in the display screen, and a user can select to enable the two planes to keep a horizontal relation or a vertical relation or select to enable the edges of the two planes to keep a horizontal relation or a vertical relation. When the user selects the edge level option, it indicates that the object plane and the reference plane are parallel, i.e. the angle between the top border of the canvas plane and the edge line of the ceiling is 0 °. From step 22, the current position relationship between the two planes is: the included angle between the top frame of the canvas frame plane and the edge line of the ceiling is 5 degrees.

Since the angle of the object is 0 ° and the angle of 5 ° is currently in a non-parallel state, the user is prompted in the display screen that the "angle between the edge lines of the selected plane is 5 °.

Obtaining an adjustment suggestion through the step 23, rotating the target plane on the wall surface counterclockwise by 5 degrees, and adjusting the picture frame according to the adjustment suggestion to enable an included angle between a top frame of the oil painting frame plane and an edge line of the ceiling to be 0 degree; the adjustment suggestion can be prompted to the user through modes such as display screen text prompt or voice broadcast prompt, and the user is instructed to rotate 5 degrees anticlockwise according to the current adjustment suggestion.

For the example, after the user selects the target plane, the reference plane and the target position relationship between the planes, the position relationship between the target plane and the reference plane can be periodically obtained, and an adjustment suggestion is given.

After the user rotates according to the adjustment suggestion, the user can continuously shoot by using a mobile phone or AR glasses, and the change of the position relation is determined in real time. For example, if the user rotates by 7 ° counterclockwise due to too large a rotation angle, the system will re-evaluate the current positional relationship, and remind the user to rotate by 2 ° clockwise until the target positional relationship is reached, and then do not remind any more.

The current position relation can be evaluated in real time, at the moment, the mobile phone or AR glasses are required to continuously shoot the current scene while the current position relation is required to be adjusted, the user can slowly rotate the oil painting, the current position relation is displayed in real time, and the adjustment is finished through a display screen or a voice prompt until the target position relation is reached.

Referring to fig. 27, a user needs to make a sofa perpendicular to a wall surface, and since the sofa is not directly adjacent to the wall, it is difficult to ensure the perpendicular relationship, and like fig. 26, a current scene is photographed by a mobile phone or AR glasses to obtain multimedia information. The user selects the side surface and the wall surface of the sofa as a target plane and a reference plane, and selects a target space relationship as follows: the bottom edge line of the sofa side face is perpendicular to the wall surface edge line, namely the included angle between the bottom edge line of the sofa side face and the wall surface edge line is 90 degrees; the current position relation between the side face of the sofa and the wall surface, namely the current included angle between the bottom edge line of the side face of the sofa and the edge line of the wall surface, can be obtained through the step 2, the angle required to be adjusted is determined by the system, and the user is prompted by a display screen or voice to adjust according to the adjustment suggestion until the target position relation is reached.

In addition, the method can also select a simple mode, only the wall surface is selected as a reference plane, the target plane is not selected, the normal vector of the wall surface is firstly obtained, the normal vector lines are displayed in the display screen, the user continuously observes the display screen, and the sofa is placed along the normal vector lines.

Second example of the third embodiment

The embodiment provides a collection equipment posture adjustment recommendation method which can prompt a user to adjust the posture of multimedia collection equipment (a camera, a mobile phone and the like) to obtain an optimal collection angle.

Applicable scenarios are as follows: when a user uses a camera or a mobile phone to take a picture, it is usually desirable to obtain a front-view picture of some specific objects, for example, when taking a picture of a document, an oil painting, etc., if the taken picture has a rotation angle and an inclination angle, it is inconvenient for subsequent reading, however, how to adjust the posture of the camera to obtain the front-view picture is not intuitive for the user.

In the embodiment, the gesture adjustment suggestion of the acquisition equipment can be presented on the display screen by analyzing the acquired multimedia information, and the user can rotate or move the acquisition equipment according to the adjustment suggestion to obtain the front-looking multimedia information.

In this embodiment, before determining the adjustment information corresponding to the target plane, the method further includes: determining the position relation between a target plane and an acquisition plane corresponding to acquisition equipment for acquiring multimedia information;

determining adjustment information corresponding to the target plane, including: and determining the attitude adjustment information of the acquisition plane according to the determined position relation, and taking the attitude adjustment information as the adjustment information corresponding to the target plane.

And adjusting the posture adjustment suggestion of the information acquisition equipment. The target plane is a plane corresponding to an object to be photographed.

The target plane may be a plane corresponding to an object to be photographed, such as a document, an oil painting, and the like, in the following examples.

The collection device for collecting the multimedia information may specifically be a camera or a mobile phone in the following embodiments, and the collection plane corresponding to the collection device for collecting the multimedia information may specifically be a plane corresponding to the camera or the mobile phone in the following embodiments.

When the adjustment information is angle information, the angle information may specifically include, in the following example: the rotation angle and/or the tilt angle is adjusted.

Fig. 28 is a schematic flow chart of the acquisition device posture adjustment recommendation method in this example.

Step 31, performing region segmentation and depth estimation on the input multimedia information (such as an image), and determining three-dimensional plane information corresponding to the multimedia information according to the region segmentation result and the region segmentation result.

The user can manually select a target plane in the three-dimensional plane, for example, when the oil painting is shot, the oil painting plane is taken as the target plane.

And step 32, determining relative normal vector information of the target plane relative to the acquisition plane.

If the position relationship between the target plane and the collection plane is parallel, that is, the target plane is collected in the front view direction, the value of the relative normal vector in the three-dimensional space is a fixed standard value (such as 1,0,0), and if the relative normal vector is not parallel, the target plane is collected in a rotating mode, and the relative normal vector is other values at this time.

The position relation between the target plane and the acquisition plane can be determined through the relative normal vector information.

Step 33, according to the position relationship between the target plane and the collection plane, the posture adjustment information of the collection plane, that is, the posture adjustment suggestion, may be specifically the adjustment direction and/or the adjustment angle.

The adjustment angle at this time may be a rotation angle. The rotation instruction can be displayed in the display screen, the rotation direction and the rotation angle are included, and the user can obtain the front-view target plane after adjusting the acquisition equipment according to the rotation instruction.

In addition, the position relationship between the target plane and the collection plane may also be determined according to the edge line of the target plane and the edge line of the collection plane, if the included angle between the two planes is not 0 °, it may be considered that the collection target plane is being tilted, and the posture adjustment information of the collection plane, that is, the posture adjustment suggestion, may be specifically the adjustment direction and/or the adjustment angle, may be determined according to the position relationship between the target plane and the collection plane. The adjusting angle at this time may be an inclination angle. The inclination indication can be displayed in the display screen, the inclination direction and the inclination angle are included, and the user can obtain the front-view target plane after adjusting the acquisition equipment according to the inclination indication.

Fig. 29 is a schematic diagram of a method for adjusting and recommending the posture of the acquisition device in this example. A user opens a camera of a mobile phone, and an acquired image is as shown in fig. 29(a), at this time, a pair of oil paintings exist in the image, but the oil paintings have a certain rotation angle, if the user is not satisfied with the shooting result, the front view mode can be selected, and in the mode, plane information is acquired through the step 31, and an oil painting plane is acquired; through the step 32, the relative normal vector of the oil painting plane is obtained, and the relative normal vector of the oil painting plane is not a fixed standard value, so that the existence of a rotation angle in the photographed image can be automatically judged, and according to the rotation angle, the camera posture is combined to obtain the camera posture adjustment suggestion that the camera is rotated in the opposite direction according to the rotation angle. The adjustment suggestion may be displayed on the display screen, as shown in fig. 29(b), and the user may rotate the camera according to the adjustment suggestion to obtain an image of the front view, wherein when the user adjusts the camera according to the adjustment suggestion, the system may update the current angle status in real time, and when the relative normal vector is a fixed standard value, the user may be prompted to complete the adjustment, as shown in fig. 29 (c).

Similarly, as shown in fig. 29(d), 29(e), and 29(f), when the included angle between the edge lines of the target plane (document) and the capture plane is not 0 °, the user is not convenient to view, and at this time, the user may be recommended to tilt the mobile phone according to the adjustment suggestion, and the front-view content may be photographed.

In this example, the tilted acquisition device represents moving the acquisition device within the plane of the acquisition plane, and the rotated acquisition device represents adjusting roll, pitch, and yaw with the center of the acquisition device as the origin.

Example four

In this embodiment of the present invention, the step 1002 further includes: determining a driving avoidance plane in three-dimensional plane information corresponding to the multimedia information; determining driving auxiliary information according to the driving evasion plane; and displaying augmented reality display information corresponding to the driving auxiliary information.

First example of the fourth embodiment

The embodiment provides a method for judging whether a vehicle can pass through in an auxiliary driving system, which can provide auxiliary information whether the vehicle can pass through when the vehicle passes through a narrow area.

When a vehicle passes through a narrow lane or a lane, a user needs to estimate the accurate width of the passable lane to judge whether the vehicle can pass through smoothly, however, the estimation of the width by a person is often rough, and even the width of the vehicle and the lane width can be estimated by mistake possibly due to some inherent human visual defects.

If the driving avoiding plane is an obstacle plane on two sides of the driving road surface; then, according to the driving evasion plane, determine driving auxiliary information, including: determining width information of the driving road surface according to the barrier planes on the two sides of the driving road surface; and determining whether the prompting is prompting information for prompting passing through the driving road surface according to the width information of the driving road surface, and using the prompting information as driving auxiliary information.

The obstacle planes on both sides of the driving road surface may be planes on both sides of a roadway, obstacle planes on both sides of a narrow lane, and may be planes corresponding to walls or other vehicles in the following embodiments.

Wherein, the specific process is shown in fig. 30, wherein,

and step 41, determining three-dimensional plane information corresponding to the multimedia information.

And 42, determining barrier planes on two sides of the driving road surface according to the acquired semantic information of the three-dimensional plane, for example, wall surfaces on two sides of the driving road surface, side planes of other vehicles, and the like, and calculating width information of the driving road surface, which may also be called road width or actual road width.

When determining the plane of the obstacle, it is necessary to determine the plane closest to the current vehicle, for example, when there are other vehicles on both sides of the driving road, the plane of the rearview mirrors of the other vehicles can be used as the plane of the obstacle.

And 43, judging whether the vehicle can pass or not according to the actual road surface width and the vehicle self attribute (such as the vehicle width, namely the vehicle width), and prompting the user to pass when the actual road surface width is larger than the vehicle body self width.

Referring to fig. 31, the driving assistance system captures front road information through a front camera to obtain a front road condition image; each plane in the front road image can be detected through step 41, the actual road width can be calculated through step 42, the actual road width is compared with the vehicle width through step 43, whether the auxiliary driving prompt information can pass or not is obtained, and the prompt information can be used as recommended content to prompt a user through voice broadcast or a mode of displaying the recommended content on an in-vehicle screen.

Second example of embodiment four

The embodiment provides a road surface condition estimation method in an auxiliary driving system, which can automatically judge the condition of a front road surface when a vehicle runs and obtain an auxiliary driving prompt.

In an automatic driving or driving-assistant system, it is a basic requirement to judge the front road condition, and the driving speed is adjusted in time according to the front road condition. If the basic judgment on the condition of the front road surface is lacked, a catastrophic driving accident may be caused.

For example, if the road surface condition ahead is poor, a large number of potholes are present, and deceleration traveling is required, and if traveling at full speed, a driving accident is likely to occur. The purpose of road surface condition estimation can be achieved only by a common low-cost optical camera, and whether the front road surface is flat or not and whether speed reduction is needed or not is judged.

If the driving avoidance plane is a plane to be avoided on the driving road surface; then, according to the driving evasion plane, determine driving auxiliary information, including: and determining driving suggestion information as driving auxiliary information according to a plane to be avoided on a driving road surface.

Wherein the plane to be avoided may in the embodiments described below be a surface that is not suitable for traffic, such as a water surface, a pit surface, an obstacle plane, etc.

Wherein, the specific flow is shown in fig. 32, wherein,

and step 51, determining three-dimensional plane information corresponding to the multimedia information.

And step 52, extracting the driving road surface and other planes which are not suitable for passing (such as a water surface, a pit surface and an obstacle plane) according to the acquired three-dimensional plane information, and performing danger classification on the planes according to the depth information of the planes, wherein shallow pits belong to low danger levels, deep pits, a large amount of accumulated water and high obstacles belong to high danger levels.

And step 53, evaluating the front dangerous condition according to the danger level, and providing driving advice information, namely driving advice, wherein the driving advice can be a danger prompt or a re-planned driving route.

And comprehensively scoring according to the danger classification grade of the relevant plane and the area size of the relevant plane to obtain the danger condition. The scoring rules can be executed according to the risk level and the size of the relevant plane in a weighting mode, a plurality of score thresholds are set, and different risk prompts are prompted after a certain threshold is exceeded. For example, partial potholes exist in the front, the danger level of the potholes is low, the pothole area is small, after comprehensive scoring is carried out, a low-level danger prompt is prompted, and deceleration and crawling are prompted; a large number of obstacles exist in front, the danger level of the obstacles is high, the area is large, the number of the obstacles is large, and the comprehensive scoring prompts a high-danger prompt and prompts parking observation.

In addition, the driving route can be planned again according to the road condition in front.

The danger prompt obtained in step 53 includes deceleration and detour, and when the driving suggestion is detour, the traveling route needs to be re-planned to avoid the front obstacle area. The example can provide the plane information to be avoided, the road surface information and the like for path planning.

Referring to fig. 33, the automatic driving system captures front road information by a front camera to obtain two left upper and lower pictures. In the upper left picture, after the steps 51 and 52, a relevant plane including a water surface and a road surface can be obtained, through danger classification, the water surface with a small area is divided into medium-risk obstacles, the obtained assessment is avoidance and detour, the driving route can be re-planned, and danger prompt information and the re-planned driving route are displayed to a user, as shown in the upper right drawing in fig. 33; similarly, in the lower left picture in fig. 33, information such as the road surface and the shallow pit surface is obtained, and in the risk classification, the shallow pit surface belongs to a low risk level, and meanwhile, an avoidance route cannot be planned according to the road condition in front, so that the user can be prompted to slow down.

EXAMPLE five

This example provides a method for implementing an AR keyboard, which can change an english keyboard into a multilingual keyboard and a normal password keyboard into an encryption password keyboard with the aid of an AR device. Since the common keyboard is usually an english keyboard, and users using different languages usually need specific language keyboards in other forms, such as a russian keyboard and a korean keyboard, since the other language keyboards are relatively difficult to obtain than the english keyboard, great inconvenience is caused to the users, and even if the users have a plurality of different language keyboards, it is very inconvenient to switch the keyboards continuously when inputting multiple languages; in addition, when inputting a password, a random password keyboard is often safer than a fixed password keyboard, and the random password keyboard is widely used in online transactions, but under the conditions of offline transactions (card swiping, money withdrawing) and the like, a physical password keyboard is still a fixed password keyboard, and a user faces an account security problem if being peeped by others when inputting the password.

The AR keyboard provided by the embodiment can be used as a multi-language keyboard, and a multi-language switchable keyboard can be presented according to an AR presentation mode; the AR keyboard can also be used as a random password keyboard, and the encrypted password keyboard is presented in an AR mode according to the encoding rule, so that only a user can see the encrypted password keyboard, and even if other people see the encrypted password keyboard, the real password cannot be obtained.

In an embodiment of the present invention, the step 1002 further includes; determining a plane required to display virtual display information in three-dimensional plane information corresponding to the multimedia information; acquiring virtual display information corresponding to each plane needing to display the virtual display information; and displaying the augmented reality display information corresponding to the virtual display information.

The plane on which the virtual display information needs to be displayed is a plane with semantic information as key positions; the virtual display information is key value information.

The method further comprises the following steps: detecting the operation of a user on a three-dimensional plane in the multimedia information; determining a user operation instruction according to actual display information and virtual display information of a three-dimensional plane corresponding to the operation; and executing corresponding operation according to the user operation instruction.

In the following examples, the plane of the key positions may be specifically the plane of the key positions in the keyboard, and the keyboard may be a normal keyboard or a password keyboard.

The flow of the implementation method of the AR keyboard in this example is shown in fig. 34.

And 61, determining three-dimensional plane information corresponding to the multimedia information, and screening out a key position plane according to the obtained three-dimensional plane information.

When the key position planes are screened, semantic information can be reserved for the key position planes according to screening rules, and other irrelevant planes are removed.

And the keyboard can be subjected to digital modeling according to the keyboard layout to obtain the 3D digital model information of the real keyboard.

And step 62, replacing the key values of the original keyboard keys with the designated virtual display information, namely, acquiring the virtual display information corresponding to each key plane needing to replace the original keys, and rendering the virtual display information at the position of the key plane to obtain AR display information. The virtual display information may be replaced key value information.

When the original key position is a language key position, the function of the multilingual keyboard can be realized through key value replacement among different languages. When the original key positions are password key positions, the function of the random password keyboard can be realized through key value replacement.

Through the three-dimensional plane information, the virtual display information can be better rendered at the position of the key position plane, and the AR virtual keyboard finally realized is more real.

Because the key value of the key position is replaced, after the fact that the user carries out operation on the key position is detected, the replaced virtual display information corresponding to the key position, namely the replaced key value, can be determined, the real operation instruction of the user is determined according to the replaced key value, and corresponding operation is carried out according to the real operation instruction of the user.

For example, when the random password keyboard is implemented by the AR virtual keyboard, when the user presses the password on the password keyboard, the bank system or the password keyboard decrypts the password of the user according to the key values before and after replacement to obtain the real password of the user, thereby completing the transaction.

The AR keyboard comprises a multilingual AR keyboard and an encryption password keyboard. Fig. 35 shows a schematic diagram of a multilingual AR keyboard, and in fig. 35(a), the real keyboard is an english keyboard, and is displayed as a russian keyboard on an AR device through the AR device (e.g., a mobile phone or an AR eye), as shown in fig. 35 (b).

Fig. 36 shows a schematic diagram of an encryption keypad, which is displayed on an AR device by the AR device as an encrypted keypad, different from the contents of the real keypad seen by others.

Compared with the prior art, the embodiment of the invention can obtain the attribute information corresponding to the three-dimensional plane information and/or the attribute information corresponding to the virtual display information by determining the three-dimensional plane information corresponding to the multimedia information, and display the augmented reality display information according to the two information, for example, can determine that the attribute information corresponding to the three-dimensional plane is the ground and the water surface, the attribute information corresponding to the virtual display information is the land animal and the aquatic animal, and can add the land animal to the ground and add the aquatic animal to the water surface according to the attribute information of the land animal and the aquatic animal to display the augmented reality information so as to avoid the situation that the displayed virtual reality information is that the land animal is on the water surface or the aquatic animal is on the ground, thereby improving the reality of the augmented reality display result, and then the experience degree of the user can be improved.

An embodiment of the present invention provides a plane determining apparatus, as shown in fig. 37, the apparatus includes: a processing module 3701, a first determining module 3702; wherein,

a processing module 3701 for performing region segmentation and depth estimation on the multimedia information.

A first determining module 3702, configured to determine three-dimensional plane information of the multimedia information according to the region segmentation result and the depth estimation result processed by the processing module 3701.

The embodiment of the present invention provides a plane determining apparatus, which can implement the method embodiment provided above, and for specific function implementation, reference is made to the description in the method embodiment, which is not repeated herein.

An embodiment of the present invention provides a display device for displaying information in augmented reality, as shown in fig. 38, the device includes: a second determination module 3801, a display module 3802, wherein,

the second determining module 3801 is configured to determine three-dimensional plane information corresponding to the multimedia information.

The display module 3802 is configured to display the augmented reality display information according to the three-dimensional plane information corresponding to the multimedia information determined by the second determining module 3801.

Compared with the prior art, the embodiment of the invention can obtain the attribute information corresponding to the three-dimensional plane information and/or the attribute information corresponding to the virtual display information by determining the three-dimensional plane information corresponding to the multimedia information, generate the augmented reality display information according to the two information, for example, can determine that the attribute information corresponding to the three-dimensional plane is the ground and the water surface, the attribute information corresponding to the virtual display information is the land animal and the aquatic animal, and can add the land animal to the ground and add the aquatic animal to the water surface according to the attribute information of the land animal and the aquatic animal to generate the augmented reality information so as to avoid the situation that the generated virtual reality information is that the land animal is on the water surface or the aquatic animal is on the ground, thereby improving the augmented reality display result, and then the experience degree of the user can be improved.

The embodiment of the present invention provides a display device for displaying information in augmented reality, which can implement the method embodiment provided above, and for specific function implementation, reference is made to the description in the method embodiment, and details are not repeated here.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Those skilled in the art will appreciate that the present invention includes apparatus directed to performing one or more of the operations described in the present application. These devices may be specially designed and manufactured for the required purposes, or they may comprise known devices in general-purpose computers. These devices have stored therein computer programs that are selectively activated or reconfigured. Such a computer program may be stored in a device (e.g., computer) readable medium, including, but not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magnetic-optical disks, ROMs (Read-Only memories), RAMs (Random Access memories), EPROMs (Erasable programmable Read-Only memories), EEPROMs (Electrically Erasable programmable Read-Only memories), flash memories, magnetic cards, or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a bus. That is, a readable medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).

It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. Those skilled in the art will appreciate that the computer program instructions may be implemented by a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the features specified in the block or blocks of the block diagrams and/or flowchart illustrations of the present disclosure.

Those of skill in the art will appreciate that various operations, methods, steps in the processes, acts, or solutions discussed in the present application may be alternated, modified, combined, or deleted. Further, various operations, methods, steps in the flows, which have been discussed in the present application, may be interchanged, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the various operations, methods, procedures disclosed in the prior art and the present invention can also be alternated, changed, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of plane determination, comprising:

2. The method of claim 1, wherein the region segmentation result comprises two-dimensional plane information and semantic information corresponding to the two-dimensional plane.

3. The method for determining a plane according to claim 2, wherein the performing region segmentation on the multimedia information specifically includes:

and carrying out region segmentation on the multimedia information through the same deep learning network to obtain two-dimensional plane information and semantic information corresponding to the two-dimensional plane.

4. The method of any one of claims 1-3, wherein the performing region segmentation and depth estimation on multimedia information further comprises:

correcting the depth estimation result according to the region segmentation result; and/or

And correcting the region segmentation result according to the depth estimation result.

5. The method of plane determination according to any one of claims 1-4, further comprising:

and adjusting the determined three-dimensional plane information according to the semantic information and the spatial relationship information corresponding to the determined three-dimensional plane information.

6. The method for determining a plane according to claim 5, wherein the adjusting the determined three-dimensional plane information specifically includes:

and determining the incidence relation between the three-dimensional planes according to the semantic information and the spatial relation information corresponding to the determined three-dimensional plane information, and adjusting the determined three-dimensional plane information according to the determined incidence relation.

7. The method of any one of claims 1-6, wherein prior to performing region segmentation and depth estimation on the multimedia information, further comprising:

determining texture information of the multimedia information; determining a texture missing region according to the texture information;

the method for carrying out region segmentation and depth estimation on multimedia information comprises the following steps:

and carrying out region segmentation and depth estimation aiming at the determined texture missing region.

8. A display method of augmented reality display information, comprising:

9. The method for displaying augmented reality information according to claim 8, wherein the step of displaying augmented reality display information according to three-dimensional plane information corresponding to the multimedia information includes:

acquiring attribute information corresponding to the three-dimensional plane information and/or attribute information corresponding to the virtual display information;

and displaying the augmented reality display information according to the attribute information corresponding to the acquired three-dimensional plane information and/or the attribute information corresponding to the virtual display information.

10. The method according to claim 9, wherein the attribute information corresponding to the three-dimensional plane includes at least one of:

semantic information corresponding to the three-dimensional plane, associated attribute information corresponding to the semantic information, and physical attribute information of the three-dimensional plane.

11. The method for displaying augmented reality display information according to claim 10, wherein the step of obtaining semantic information corresponding to three-dimensional plane information includes:

semantic information corresponding to the two-dimensional plane information of the multimedia information is used as semantic information of corresponding three-dimensional plane information;

or determining semantic information of three-dimensional plane information according to semantic information corresponding to two-dimensional plane information of the multimedia information and a depth estimation result of the multimedia information;

or performing semantic analysis on the three-dimensional plane information to obtain semantic information corresponding to the three-dimensional plane.

12. The method for displaying augmented reality display information according to any one of claims 9 to 11, wherein displaying the augmented reality display information according to the attribute information corresponding to the acquired three-dimensional plane information and/or the attribute information corresponding to the virtual display information includes:

determining the position relation between the virtual display information and the three-dimensional plane and/or the position relation between the virtual display information according to the attribute information corresponding to the acquired three-dimensional plane information and/or the attribute information corresponding to the virtual display information;

and displaying the augmented reality display information according to the determined position relation.

13. The method for displaying augmented reality display information according to claim 8, wherein the displaying augmented reality display information according to the three-dimensional plane information corresponding to the multimedia information includes:

determining a target plane in three-dimensional plane information corresponding to the multimedia information;

determining adjustment information corresponding to the target plane;

and displaying the augmented reality display information corresponding to the adjustment information.

14. The method for displaying augmented reality display information according to claim 13, wherein before determining the adjustment information corresponding to the target plane, the method further comprises:

determining a reference plane and a position relation between the target plane and the reference plane in three-dimensional plane information corresponding to the multimedia information;

determining adjustment information corresponding to the target plane, including:

and determining position adjustment information of the target plane according to the determined position relationship, wherein the position adjustment information is used as adjustment information corresponding to the target plane.

15. The method for displaying augmented reality display information according to claim 13, wherein before determining the adjustment information corresponding to the target plane, the method further comprises:

determining the position relation between the target plane and an acquisition plane corresponding to acquisition equipment for acquiring the multimedia information;

and determining the attitude adjustment information of the acquisition plane according to the determined position relation, and taking the attitude adjustment information as the adjustment information corresponding to the target plane.

16. The method for displaying augmented reality display information according to any one of claims 13 to 15, wherein the adjustment information is adjustment direction information and/or adjustment angle information.

17. The method for displaying augmented reality display information according to claim 8, wherein the displaying augmented reality display information according to the three-dimensional plane information corresponding to the multimedia information includes:

determining a driving avoidance plane in three-dimensional plane information corresponding to the multimedia information;

determining driving auxiliary information according to the driving avoidance plane;

and displaying the augmented reality display information corresponding to the driving assistance information.

18. The augmented reality display method of claim 17, wherein the driving avoidance plane is an obstacle plane on both sides of a driving road surface;

according to the driving evasion plane, determining driving auxiliary information, comprising:

determining width information of the driving road surface according to the barrier planes on the two sides of the driving road surface;

and determining whether the prompting is prompting information for prompting whether the vehicle can pass through the driving road surface according to the width information of the driving road surface, wherein the prompting information is used as the driving auxiliary information.

19. The augmented reality display information display method of claim 17, wherein the driving avoidance plane is a plane to be avoided on a driving road surface;

and determining driving suggestion information as the driving auxiliary information according to a plane to be avoided on a driving road surface.

20. The method for displaying augmented reality display information according to claim 8, wherein the displaying augmented reality display information according to the three-dimensional plane information corresponding to the multimedia information includes:

determining a plane required to display virtual display information in the three-dimensional plane information corresponding to the multimedia information;

acquiring virtual display information corresponding to each plane needing to display the virtual display information;

and displaying the augmented reality display information corresponding to the virtual display information.

21. The method for displaying augmented reality display information according to claim 20, wherein the plane on which the virtual display information needs to be displayed is a plane in which semantic information is a key; the virtual display information is key value information.

22. The method of displaying augmented reality display information according to claim 20 or 21, further comprising:

detecting the operation of a user on a three-dimensional plane in the multimedia information;

determining a user operation instruction according to the actual display information and the virtual display information of the three-dimensional plane corresponding to the operation;

and executing corresponding operation according to the user operation instruction.

23. The method for displaying augmented reality display information according to any one of claims 8 to 22, wherein determining the three-dimensional plane information corresponding to the multimedia information includes:

24. The method of claim 23, wherein the region segmentation result comprises two-dimensional plane information and semantic information corresponding to the two-dimensional plane.

25. The method for displaying augmented reality display information according to claim 24, wherein the performing region segmentation on the multimedia information specifically includes:

26. The method for displaying augmented reality display information of any one of claims 23 to 25, wherein the performing region segmentation and depth estimation on the multimedia information further comprises:

27. The method for displaying augmented reality display information according to any one of claims 23 to 26, further comprising:

28. The method for displaying augmented reality display information of any one of claims 23 to 27, wherein before the performing the region segmentation and the depth estimation on the multimedia information, the method further comprises:

determining texture information of the multimedia information;

determining a texture missing region according to the texture information;

and carrying out region segmentation and depth estimation aiming at the texture missing region.

29. An apparatus for plane determination, comprising:

30. A display device for displaying information in augmented reality, comprising: