CN115309301A - Android mobile phone end-side AR interaction system based on deep learning - Google Patents
Android mobile phone end-side AR interaction system based on deep learning Download PDFInfo
- Publication number
- CN115309301A CN115309301A CN202210541388.0A CN202210541388A CN115309301A CN 115309301 A CN115309301 A CN 115309301A CN 202210541388 A CN202210541388 A CN 202210541388A CN 115309301 A CN115309301 A CN 115309301A
- Authority
- CN
- China
- Prior art keywords
- depth
- model
- mobile phone
- image
- phone end
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 29
- 238000013135 deep learning Methods 0.000 title claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 29
- 230000006870 function Effects 0.000 claims abstract description 17
- 238000013528 artificial neural network Methods 0.000 claims abstract description 12
- 238000003062 neural network model Methods 0.000 claims abstract description 11
- 230000000694 effects Effects 0.000 claims description 22
- 238000012549 training Methods 0.000 claims description 13
- 238000009877 rendering Methods 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 5
- 230000002452 interceptive effect Effects 0.000 claims description 5
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 238000012952 Resampling Methods 0.000 claims description 3
- 240000001436 Antirrhinum majus Species 0.000 claims description 2
- 238000005538 encapsulation Methods 0.000 claims description 2
- 238000011161 development Methods 0.000 abstract description 7
- 230000008569 process Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/04815—Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an Android Mobile phone end face AR interaction system based on deep learning, which comprises a Mobile phone with a camera, wherein the Mobile phone camera collects original color image data, processes an image stream in real time by calling an API (application program interface) of the camera, trains an efficient and robust lightweight depth estimation neural network model by using a Pythrch Mobile deep learning framework, runs neural network reasoning on the Mobile phone end face by using the limited computing power of the Mobile phone, and generates a prediction depth map corresponding to the original image data. And combining the original image and the predicted Depth map, and realizing the Android mobile phone end-side AR interaction system independent of the Depth API by utilizing the AR interaction function and the Unity development example of the ARCore Depth Lab.
Description
Technical Field
The invention relates to the field of three-dimensional scene perception, in particular to an Android mobile phone end-side AR interaction system based on deep learning.
Background
In recent years, with the rapid development of deep learning and neural network technology, the related applications in the field of computer vision have been advanced dramatically. Meanwhile, people have an increasing need for entertainment of vision-related mobile phone applications. People are no longer satisfied with interacting with scenes in simple two-dimensional images and begin to expect deeper interaction with stereoscopic three-dimensional scenes. In the process of realizing the interaction with the three-dimensional scene, the depth estimation is used as a key ring of three-dimensional perception, and plays a vital role. When the traditional camera equipment shoots images and videos, only limited 2D image information can be obtained, depth information in a real three-dimensional world is lacked, and the defects of high cost, large size and the like exist when distance measuring equipment such as a radar, an RGBD camera and the like is adopted. In addition, the current monocular depth estimation algorithm with higher precision generally depends on a high-performance computational power environment, a better depth estimation effect is difficult to obtain in a non-ideal experimental environment, and the algorithm cannot be well deployed to a mobile terminal, so that the popularization and application limitations of the algorithm are exposed. Therefore, an interactive system which does not depend on a high-performance computing environment and distance measuring equipment and can be directly deployed on a mobile terminal to realize a real-time 3D scene interactive function has a great application prospect.
The existing two-dimensional video special effect technology such as the special effect technology on short video editors like Tik Tok has certain limitation on the effect of video secondary creation. For example, when a user wants to add a special effect of a specific scene to a video (such as snow), the conventional two-dimensional video technology can only stitch a static two-dimensional picture with a character, which is hard and disadvantageous to the effect of the video. The invention can directly construct a 3D scene according to the depth estimation result, and add a simulated special effect, thereby better reflecting the depth level change of the environment in the video, leading the video to be more real, vivid and improving the film watching experience of the video.
The invention aims to utilize a light monocular depth estimation network to calculate the scene depth in real time in an AR scene at the mobile phone end side under the condition of limited calculation power at the mobile phone end, and restore the real scene to the maximum extent. On the basis, the Unity rendering engine and the like are used for manufacturing the special effect, and the invention can realize the interaction effect between people and the environment by setting the virtual object in the real environment.
Disclosure of Invention
The invention aims to obtain more accurate depth information from simple 2D video input by applying a more mature algorithm training model, solve the depth estimation problem under a monocular camera system, solve the defects of precision and efficiency of monocular depth estimation under the traditional method, provide a light monocular depth estimation network with good robustness, high precision and high efficiency, break through the dependence of the current high-precision monocular depth estimation algorithm on a high-performance example environment, pay attention to practical application, and explore the possibility of applying the method to AR and VR scenes at the mobile phone end. Besides meeting the entertainment requirements, the novel medical robot has wide application prospects in future automatic driving, intelligent medical treatment and military operation.
In order to achieve the purpose, the invention provides the following technical scheme: and finally, based on the assistance of the depth information, a Unity software is used for making a three-dimensional special effect so as to realize the generation of a virtual object at an accurate position and man-machine interaction, so as to face an AR/VR practical application scene. The method and the system deploy the algorithm to the mobile phone side through android development and by combining a Pythrch mobile framework, and realize real-time interaction at the mobile phone side.
Specifically, the method comprises the following steps:
a) Acquisition of training/test data: performing large-scale network training by adopting an open source data set such as NYU-Depth V2, shooting videos indoors by adopting a Kinect DK camera, automatically generating a Depth map as supervision information, and taking the videos shot by a monocular camera as an input test sample;
b) Designing a monocular depth estimation algorithm: and constructing and applying by adopting an AR Core framework, using parameters returned by the AR Core as initial values of camera parameters, and adjusting corresponding parameters by combining a network to obtain a camera pose as a basis of geometric constraint of interframe depth estimation. Designing a loss function of the network on the basis of a main network for deep prediction by using a pre-trained lightweight network EfficientNet, and training on a data set;
c) Evaluating a monocular depth estimation algorithm: the real depth value of the training data set is used as a supervision signal of the model, the prediction result of the model is compared, the minimum loss function of the model is constructed, meanwhile, the capability of providing reasonable regularization at the part with less constraint is reserved, and accurate depth information is obtained to achieve the interaction effect;
d) Deployment algorithm on the end side: the Unity is used as an auxiliary tool for development, after the depth information of the neural network is inferred, the information is imported into a Unity module, a scene is reconstructed through an algorithm, unity software is used for adding a special effect, and the Unity module is deployed on a Mobile terminal of the Mobile phone by utilizing Pythroch Mobile.
Preferably, the mobile phone system is an Android system and the version is Android 8 or more.
Preferably, the mobile end side chip is of a high-pass Snapdragon 865 model or above, and a CPU or a GPU can be used to complete neural network reasoning, thereby realizing high-frame-rate operation.
Preferably, the lightweight depth estimation model deployed at the Mobile phone end is a method for creating a serializable and optimizable model from pytorque codes through torchscript after training at the server end is completed, model conversion and model optimization are performed, the converted model is in a ptl format and comprises model weights and a model interpreter, and meanwhile, through model optimization of a PyTorch Mobile module, the average inference speed of the optimized model is improved by 60% compared with that before optimization.
Preferably, the lightweight depth estimation method for deployment of the lightweight depth estimation model at the mobile phone end side includes the following implementation steps:
s1.1: training a model on a server, and training model weights by adopting a depth data set;
S1.3: model reasoner on Android Studio software through Java programmingLeading into an ARCore module;
s1.4: call handset camera API, acquiring an image stream I = { I = { [ I ] 1 ,I 2 ,…,I n And extract the current frame I n As input I of RGB image RGB ;
S1.6: depth map I to be predicted Depth And adding the data into the data stream to realize the encapsulation of the module.
Preferably, the lightweight depth estimation neural network model algorithm specifically comprises the following steps:
s2.1: the method comprises the steps that a lightweight depth estimation model of a depth map is predicted at a mobile phone end side, color RGB images (the image format is YUV 420) shot by a camera and pose parameters of the camera (the camera pose parameters returned in an ARCore frame of Google are required to be used as initial values of the camera parameters) are input into the lightweight depth estimation model, and the lightweight depth estimation model is output into a predicted depth image in a RAW format and a predicted confidence image;
s2.2: the depth estimation neural network model is a monocular depth estimation model, single inference completed by the model does not depend on information of front and back image frames or multiple images, and single depth estimation can be completed by inputting a single image;
s2.3: the depth estimation neural network model is a lightweight network model, a model inference device deployed at a mobile phone end is smaller than 150M, and the depth map prediction with FPS of 30 frames per second is realized on a mobile phone platform with high pass Snapdagon 865 and above.
S2.4: inputting an image I by taking EfficientNet as a backbone network of a depth prediction algorithm encoder RGB Extracting features at different resolutions (one half, one fourth, one eighth and one sixteenth) through EfficientNet to construct an image feature pyramid { S } 1/2 ,S 1/4 ,S 1/8 ,S 1/16 In the present invention, the model backbone network can be constructed by a similar lightweight model (e.g., mobileNet)Replacement;
s2.5: the multi-scale fusion structure is adopted as a decoder of the depth prediction algorithm, as shown in fig. 3, a decoder module receives a feature branch under the current resolution and a feature branch under the upper resolution, and the features of the upper resolution are spliced and fused with the features of the current resolution through a residual convolution module. The residual convolution module is formed by combining two Relu activation layers and two convolution modules with convolution kernels of 3x3 in a cross-serial mode. Inputting the fused features into a residual convolution module with the same structure, and outputting the features of the current branch through a resampling module and a convolution module with a convolution kernel size of 1x 1;
s2.6: using the multi-scale loss as a loss function of the neural network model, and calculating the formula as follows:
the gradient difference of the predicted depth and the real depth in the data set in the directions of the x axis and the y axis is calculated respectively by a formula, and the gradient difference are added and fused under different scale resolutions.
S2.7: for better robustness and generalization capability of the model on different data sets, the model uses affine-invariant depth prediction, i.e. d * = ds + μ. And s and mu are scales and offsets in affine transformation, and affine transformation parameters between the predicted depth and the real depth are obtained through a global least square method.
S2.8: the model is trained on a plurality of public depth data sets such as NyuDepthv2, KITTI, scanNet, ETH3D and the like, so that the model learns enough data prior, and the generalization capability of the model is improved.
Preferably, the step of implementing the AR interactive function by using the AR interactive function of the ARCore Depth Lab and Unity is as follows:
s3.1, after the depth information prediction of the neural network is completed, replacing a depth image returned by an ARCore calling depth API with the generated depth prediction image, and calling ARCore in Unity;
s3.2: generating grid information of a scene through a depth map by using a rendering engine provided by unity, and rendering a pseudo color map representing the depth information;
s3.3: and adding a corresponding special effect to the depth scene by using a part of functions of the ARCore depth lab and utilizing a special effect component of the unity scene.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention directly deploys the algorithm to the mobile phone end side and utilizes the mobile phone computing power to carry out neural network reasoning, thereby avoiding the serious dependence of the existing monocular depth estimation method on the computing power resource of a large-scale server. The existing deep estimation network is difficult to consider both precision and efficiency, a method with higher precision usually needs a long-time model reasoning process, and the model structure is usually more complex. Different from the existing large-scale deep learning network, the method provided by the invention achieves effective balance in precision and efficiency. The invention adopts a lightweight network structure model to realize the monocular depth estimation network inference frame by frame, the model structure of the network is simpler, the calculation power consumed during the network training is reduced, and meanwhile, the network inference is convenient to operate and the end-side deployment is convenient to carry out;
2. the method has the advantages that the development of the application at the depth estimation end side on an Android platform is realized, the method is different from the existing framework for operating neural network reasoning on a Mobile phone platform, the existing method generally uses a Pythroch model to train a model at a server end, obtains the model after parameter convergence, converts the model into an ONNX format, converts the model into a Tensiloflow framework for operation, and uses the Tensiloflow lite module to finish the model reasoning at the Mobile device end side, the method is independent of the Tensiloflow framework, uses the Pythroch Mobile direct conversion model, and uses the Pythroch framework to directly operate the model reasoning at the Mobile device end, so that the method is more convenient and faster, and avoids switching the operation model at different deep learning frameworks;
3. the depth estimation method provided by the invention avoids dependence on a depth API which is an interface (only supported by part of high-end mobile phone models) provided by an Android platform mobile phone system, is different from the existing software such as depthlab, the depth information in the method is obtained by deep learning model reasoning from RGB images, does not need additional hardware equipment (such as a laser radar, a millimeter wave radar and other depth sensors) to acquire the depth information, utilizes unity as a three-dimensional special effect development tool to realize the AR/VR interaction function, and has strong practical application value;
4. the invention carries out the reasoning of a depth estimation network on a framework of a Pythrch Mobile, the framework provides an end-to-end working process, the process from research to a production environment at the side of a Mobile device is simplified, and the framework is protected. The invention adopts a clear structural framework, and facilitates the subsequent modification and upgrading operation of each part of content.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system model flow diagram of the present invention;
FIG. 2 is a diagram illustrating a depth estimation result of the system according to the present invention;
FIG. 3 is a diagram of an algorithm model of the system of the present invention;
FIG. 4 is a diagram illustrating AR interaction of the system of the present invention.
Detailed Description
For further understanding of the present invention, the objects, technical solutions and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings and examples. It is to be understood that the present invention is illustrative only and not limiting. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Referring to fig. 1, the present invention provides a technical solution: an Android mobile phone end face AR interaction system based on deep learning comprises a mobile phone with a camera, original color image data are collected by the mobile phone through calling of a camera API, camera parameters, pose and the like are obtained, camera frames are extracted, and image streams are processed in real time. The method comprises the steps that an efficient and robust lightweight deep estimation neural network model is trained by the server side through a Pytrch Mobile deep learning framework, a serializable and optimizable model is established from Pytrch codes through a torchscript after training is finished, model conversion and model optimization are conducted, and the model is processed and stored in a ptl format including model weights and a model interpreter. The lightweight Depth estimation model imports a model file converted by means of the torchscript into an ARCore module through Java language and Android Studio software, operates reasoning at the mobile phone end side, and replaces a Depth API interface with a Depth map obtained through reasoning and prediction to realize input and output of data streams. And (4) running neural network reasoning at the side of the mobile phone by using the limited computing power of the mobile phone to generate a predicted depth map corresponding to the original image data. After the depth information prediction of the neural network is completed, replacing the depth image returned by the ARCore calling depth API with the generated depth prediction graph, and calling ARCore in Unity. Firstly, generating mesh information of a scene through a depth map by using a rendering engine provided by unity, and rendering a pseudo color map representing the depth information; and then, using partial functions of the ARCore-depth-lab, and utilizing the special effect component of the unity scene to add the corresponding special effect to the depth scene. And combining the original image and the predicted Depth map, and realizing the Android mobile phone end-side AR interaction system independent of the Depth API by utilizing the AR interaction function and the Unity development example of the ARCore Depth Lab.
Please refer to fig. 2, which shows the depth map effect tested by the network structure model. Fig. 2 is a diagram showing the effect of depth map construction of an indoor scene by a depth estimation framework and a lightweight depth estimation network model introduced by the present invention, wherein the first line and the third line are input RGB images, and the second line and the fourth line are corresponding depth maps predicted by using the network of the present invention. After a multi-scale fusion decoding frame is adopted, the estimation of the detail part of the model prediction graph is more accurate, and most three-dimensional information of a scene is recovered under limited calculation force.
Please refer to fig. 3, which is a schematic diagram of a depth estimation network model structure according to the present invention. The network model adopts EfficientNet as a backbone network of an encoder to extract image features, an image pyramid is constructed under different resolutions, a multi-scale fusion decoder is adopted to fuse the image features, and finally a depth map corresponding to a predicted image is decoded through a residual convolution module. The residual convolution module is formed by sequentially arranging and connecting a Relu activation layer, a convolution module with convolution kernel size of 3x3, a Relu activation layer and a convolution module with convolution kernel size of 3x3 in series; the multi-scale fusion module receives the feature graphs of the current feature branch and the previous feature branch, fuses the features of the previous feature branch after passing through the residual convolution module with the features of the current branch, and then sequentially connects a residual convolution module with the same structure, a resampling module and a convolution module with the convolution kernel size of 1x1 to output decoded features.
Please refer to fig. 4, which is a diagram illustrating an AR interaction effect actually measured at an Android mobile phone mobile terminal according to the technical solution of the present invention. After the depth map rendering is operated, the model can finish the rendering of the scene in a short time to generate a corresponding pseudo color map. According to the depth estimation result, the corresponding object can be aligned at the mobile phone end, and the virtual object can be placed. And moving the mobile phone, wherein the virtual object can move correspondingly along with the scene, so that the interaction of the three-dimensional information is realized.
The above-disclosed preferred embodiments of the present invention are merely illustrative of the technical solutions of the present invention and not restrictive. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents, and modifications and equivalents may be made thereto without departing from the spirit and scope of the invention, which should be limited only by the claims and their full scope and equivalents.
Claims (7)
1. Android cell-phone terminal side AR interactive system based on degree of depth study, its characterized in that:
firstly, acquiring original color image data by a Mobile phone through calling a camera API, using a high-efficiency and robust lightweight depth estimation neural network model trained by a Pythrch Mobile deep learning framework, using the limited computing power of the Mobile phone at the Mobile phone end side, operating neural network inference, processing an image stream in real time, and then generating a prediction depth map corresponding to the original image data. And finally, combining the original image and the predicted Depth map, and realizing the AR interaction function by utilizing the AR interaction function and Unity of the ARCore Depth Lab.
2. The deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the mobile phone system is an Android system and the version is Android 8 or above.
3. The deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the mobile phone can use a CPU or a GPU to finish neural network reasoning, and recommends using a high-performance chip (such as a high-pass Snapdragon 865) to realize the operation of a high frame rate.
4. The deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the lightweight depth estimation model deployed at the mobile phone end is converted and optimized by a method for creating a serializable and optimizable model from a PyTorch code through a torchscript after training at a server end, and the stored model suffix is in a ptl format, and model file information comprises model weight and an interpreter of the model;
5. the deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the lightweight depth estimation method deployed at the mobile phone end side comprises the following implementation steps:
s1.1: training a model on a server, and training model weights by adopting a depth data set;
S1.3: model reasoner on Android Studio software through Java programmingLeading into an ARCore module;
s1.4: calling the API of the mobile phone camera to acquire an image stream I = { I = { (I) 1 ,I 2 ,…,I n And extract the current frame I n As input I of RGB image RGB ;
S1.6: depth map I to be predicted Depth Adding the data into a data stream to realize the encapsulation of the module;
6. the deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the lightweight depth estimation neural network model algorithm specifically comprises the following steps:
s2.1: the method comprises the steps that a lightweight depth estimation model of a depth map is predicted at a mobile phone end side, color RGB images (the image format is YUV 420) shot by a camera and pose parameters of the camera (the camera pose parameters returned in an ARCore frame of Google are required to be used as initial values of the camera parameters) are input into the lightweight depth estimation model, and the lightweight depth estimation model is output into a predicted depth image in a RAW format and a predicted confidence image;
s2.2: the depth estimation neural network model is a monocular depth estimation model, single inference completed by the model does not depend on information of front and back image frames or multiple images, and single depth estimation can be completed by inputting a single image;
s2.3: the depth estimation neural network model is a lightweight network model, a model inference device deployed at a mobile phone end is smaller than 150M, and the depth map prediction with FPS of 30 frames per second is realized on a mobile phone platform with high pass Snapdagon 865 and above.
S2.4: inputting an image I by taking EfficientNet as a backbone network of a depth prediction algorithm encoder RGB Extracting features at different resolutions (one-half, one-fourth, one-eighth and one-sixteenth) through EfficientNet to construct the image feature pyramid { S } 1/2 ,S 1/4 ,S 1/8 ,S 1/16 In the invention, the model backbone network can be replaced by a similar lightweight model (such as MobileNet);
s2.5: the multi-scale fusion structure is adopted as a decoder of the depth prediction algorithm, as shown in fig. 3, a decoder module receives a feature branch under the current resolution and a feature branch under the upper resolution, and the features of the upper resolution are spliced and fused with the features of the current resolution through a residual convolution module. The residual convolution module is formed by combining two Relu activation layers and two convolution modules with convolution kernels of 3x3 in a cross-serial mode. Inputting the fused features into a residual convolution module with the same structure, and outputting the features of the current branch through a resampling module and a convolution module with a convolution kernel size of 1x 1;
s2.6: using the multi-scale loss as a loss function of the neural network model, and calculating the formula as follows:
the gradient difference of the predicted depth and the real depth in the data set in the directions of the x axis and the y axis is calculated respectively by a formula, and the gradient difference are added and fused under different scale resolutions.
S2.7: for better robustness and generalization capability of the model on different data sets, the model uses affine-invariant depth prediction, i.e. d * = ds + μ. And s and mu are the scale and the offset in affine transformation, and affine transformation parameters between the predicted depth and the true depth are obtained through a global least square method.
S2.8: the model is trained on a plurality of public depth data sets such as NyuDepthv2, KITTI, scanNet, ETH3D and the like, so that the model learns enough data prior, and the generalization capability of the model is improved.
7. The deep learning-based Android mobile phone end-side AR interaction system of claim 1, characterized in that: the method for realizing the AR interaction function by utilizing the AR interaction function and Unity of the ARCore Depth Lab comprises the following steps:
s3.1, after the depth information prediction of the neural network is completed, replacing a depth image returned by an ARCore calling depth API with the generated depth prediction image, and calling ARCore in Unity;
s3.2: generating grid information of a scene through a depth map by using a rendering engine provided by unity, and rendering a pseudo color map representing the depth information;
s3.3: and adding a corresponding special effect to the depth scene by using a part of functions of the ARCore depth lab and utilizing a special effect component of the unity scene.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210541388.0A CN115309301A (en) | 2022-05-17 | 2022-05-17 | Android mobile phone end-side AR interaction system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210541388.0A CN115309301A (en) | 2022-05-17 | 2022-05-17 | Android mobile phone end-side AR interaction system based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115309301A true CN115309301A (en) | 2022-11-08 |
Family
ID=83854804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210541388.0A Pending CN115309301A (en) | 2022-05-17 | 2022-05-17 | Android mobile phone end-side AR interaction system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115309301A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116152323A (en) * | 2023-04-18 | 2023-05-23 | 荣耀终端有限公司 | Depth estimation method, monocular depth estimation model generation method and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110221689A (en) * | 2019-05-10 | 2019-09-10 | 杭州趣维科技有限公司 | A kind of space drawing method based on augmented reality |
CN110716641A (en) * | 2019-08-28 | 2020-01-21 | 北京市商汤科技开发有限公司 | Interaction method, device, equipment and storage medium |
CN111465962A (en) * | 2018-10-04 | 2020-07-28 | 谷歌有限责任公司 | Depth of motion for augmented reality of handheld user devices |
CN114332666A (en) * | 2022-03-11 | 2022-04-12 | 齐鲁工业大学 | Image target detection method and system based on lightweight neural network model |
-
2022
- 2022-05-17 CN CN202210541388.0A patent/CN115309301A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111465962A (en) * | 2018-10-04 | 2020-07-28 | 谷歌有限责任公司 | Depth of motion for augmented reality of handheld user devices |
CN110221689A (en) * | 2019-05-10 | 2019-09-10 | 杭州趣维科技有限公司 | A kind of space drawing method based on augmented reality |
CN110716641A (en) * | 2019-08-28 | 2020-01-21 | 北京市商汤科技开发有限公司 | Interaction method, device, equipment and storage medium |
CN114332666A (en) * | 2022-03-11 | 2022-04-12 | 齐鲁工业大学 | Image target detection method and system based on lightweight neural network model |
Non-Patent Citations (3)
Title |
---|
余方洁: "基于深度图的移动端点云分割方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 08, 15 August 2021 (2021-08-15), pages 138 - 273 * |
刘强: "《构建企业级推荐系统 算法 工程实现与案例分析》", 13 July 2021, 机械工业出版社, pages: 169 * |
马榕 等: "基于单目深度估计的低功耗视觉里程计", 《系统仿真学报》, vol. 33, no. 12, 18 December 2021 (2021-12-18), pages 3001 - 3011 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116152323A (en) * | 2023-04-18 | 2023-05-23 | 荣耀终端有限公司 | Depth estimation method, monocular depth estimation model generation method and electronic equipment |
CN116152323B (en) * | 2023-04-18 | 2023-09-08 | 荣耀终端有限公司 | Depth estimation method, monocular depth estimation model generation method and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113572962B (en) | Outdoor natural scene illumination estimation method and device | |
CN113706699B (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
CN113077505B (en) | Monocular depth estimation network optimization method based on contrast learning | |
KR20200128378A (en) | Image generation network training and image processing methods, devices, electronic devices, and media | |
CN113284173B (en) | End-to-end scene flow and pose joint learning method based on false laser radar | |
CN113034413B (en) | Low-illumination image enhancement method based on multi-scale fusion residual error coder-decoder | |
CN115690382A (en) | Training method of deep learning model, and method and device for generating panorama | |
CN116721207A (en) | Three-dimensional reconstruction method, device, equipment and storage medium based on transducer model | |
CN116957931A (en) | Method for improving image quality of camera image based on nerve radiation field | |
CN113382275A (en) | Live broadcast data generation method and device, storage medium and electronic equipment | |
CN115309301A (en) | Android mobile phone end-side AR interaction system based on deep learning | |
CN116863003A (en) | Video generation method, method and device for training video generation model | |
CN112200817A (en) | Sky region segmentation and special effect processing method, device and equipment based on image | |
CN109788270A (en) | 3D-360 degree panorama image generation method and device | |
CN117218246A (en) | Training method and device for image generation model, electronic equipment and storage medium | |
CN115100707A (en) | Model training method, video information generation method, device and storage medium | |
Ren et al. | Efficient human pose estimation by maximizing fusion and high-level spatial attention | |
CN114022356A (en) | River course flow water level remote sensing image super-resolution method and system based on wavelet domain | |
CN113793420A (en) | Depth information processing method and device, electronic equipment and storage medium | |
WO2024159553A1 (en) | Decoding method for volumetric video, and storage medium and electronic device | |
CN115035173B (en) | Monocular depth estimation method and system based on inter-frame correlation | |
CN114758205B (en) | Multi-view feature fusion method and system for 3D human body posture estimation | |
CN116258756A (en) | Self-supervision monocular depth estimation method and system | |
CN114926594A (en) | Single-view-angle shielding human body motion reconstruction method based on self-supervision space-time motion prior | |
CN114881849A (en) | Depth image super-resolution reconstruction method combining monocular depth estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |