CN116206040A

CN116206040A - Method and device for acquiring AO mapping

Info

Publication number: CN116206040A
Application number: CN202111430926.0A
Authority: CN
Inventors: 纪道明; 庄新瑞; 廖晶堂; 徐紫雅; 刘芊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2023-06-02

Abstract

The application discloses a method and a device for acquiring an AO (object-oriented) map, which relate to the field of image processing and are used for improving the efficiency of acquiring high-quality AO maps. The scheme comprises the following steps: acquiring a geometric information layer of each view angle of the 3D object in N view angles according to the two-dimensional images of the N view angles of the 3D object and camera parameters of the N view angles, wherein the geometric information layer comprises a normal map; n is greater than or equal to 2; inputting the geometric information image layer of each view angle into a trained neural network to obtain a single view angle AO image layer of each view angle; performing UV parameterization according to the two-dimensional images of the 3D object at N visual angles to obtain UV parameters of a grid model of the 3D object; and fusing the single-view AO layers of each view according to the obtained UV parameters to obtain the AO mapping of the 3D object.

Description

Method and device for acquiring AO mapping

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a method and apparatus for obtaining an ambient light masking (ambient occlusion, AO) map.

Background

With the development of computer graphics technology, the requirements of users on the fidelity of graphic images are higher and higher, and the most critical link of the fidelity of graphic images is the illumination/light effect, if the illumination effect which is the same as the real life can be very similar simulated, the images presented to users are unusual. The application scenes of the image frames are numerous, and potential application scenes such as three-dimensional (3D) 3D virtualized merchandise display, museum cultural relics display and the like are provided. The 3D virtualization commodity display technology has great potential in various industries, such as 3D virtualization display of shoes, bags, clothes and the like, and can effectively improve the purchase probability of users.

Because of the high-speed development of rendering technology and related hardware, the user's requirement on the rendering quality of 3D digital content is continuously improved, and higher requirements are put forward for the sense of realism and layering. The 3D perception enhancement technology (3D perception enhancement) is one of methods for effectively improving the rendering quality, and the AO technology utilizes perceptual sources such as contrast, shadow and the like to effectively improve the 3D perception quality in the aspect of illumination of the rendering content, so that the rendering result is more hierarchical. High quality AO has been widely used in 3D digital content rendering to greatly enhance the user 3D perceived experience, and fig. 1a, 1b, 1c and 1D illustrate the comparison of AO-free rendering results and AO-free rendering results for the same image.

Of course, the industry typically employs ray tracing technology to obtain high quality single view AO layers, but this technology has significant performance overhead and high hardware requirements. Although graphics processor (graphics processing unit, GPU) computing power continues to grow rapidly in recent years, computing is still quite inefficient in acquiring high quality AO-maps.

Disclosure of Invention

The application provides a method and a device for acquiring an AO (object) map, so as to improve the efficiency of acquiring high-quality AO maps.

In order to achieve the above purpose, the present application adopts the following technical scheme:

in a first aspect, a method for obtaining an AO map is provided, the method may include: acquiring a geometric information layer of each view angle of the 3D object in N view angles according to the two-dimensional images of the N view angles of the 3D object and camera parameters of the N view angles, wherein the geometric information layer comprises a normal map; n is greater than or equal to 2; inputting the geometric information image layer of each view angle into a trained neural network to obtain a single view angle AO image layer of each view angle; performing UV parameterization according to the two-dimensional images of the 3D object at N visual angles to obtain UV parameters of a grid model of the 3D object; and fusing the single-view AO layers of each view according to the obtained UV parameters to obtain the AO mapping of the 3D object.

Through the method for acquiring the AO map, the neural network for acquiring the single-view AO map is trained in advance, the geometric information map layers of different view angles of the 3D object are input into the trained neural network, the single-view AO map is efficiently predicted and output by the neural network, and then the plurality of single-view AO map layers are fused according to the UV parameters of the 3D object grid model to obtain the AO map. The neural network is trained through high-quality training data, so that the output of the neural network can be ensured to be close to the real illumination; in addition, because the neural network can predict with high efficiency, even if the scene complexity is improved, the efficiency of acquiring the AO mapping is great, and therefore, the scheme of the application can improve the efficiency of acquiring the AO mapping with high quality. Furthermore, the process of obtaining the single-view AO image layer does not need to depend on UV parameters, and even if a designer re-expands the UV of the model, the scheme of the application can fuse the obtained single-view AO image layer according to the updated UV parameters, so that the reusability is high, and the efficiency of obtaining the high-quality AO image is further improved.

In one possible implementation manner, the geometric information layer may further include a depth map, which may further improve the quality of the acquired AO layer.

In another possible implementation manner, the method for obtaining an AO map provided in the present application may further include: acquiring updated UV parameters of the 3D object; and according to the updated UV parameters of the 3D object, fusing the single-view AO layers of each view to obtain the updated AO mapping of the 3D object. According to the updated UV parameters, the acquired single-view AO layers are fused, the AO mapping is not required to be acquired again, the reusability is high, the calculation time and the calculation cost are greatly saved, and the efficiency of acquiring the high-quality AO mapping is further improved.

In another possible implementation, the difference between AO values corresponding to the same position between single view AO picture layers of different views is less than or equal to the first preset value.

In another possible implementation, the difference between the AO value at the same position calculated by the single-view AO layer and the ray tracing rendering equation is less than or equal to the second preset value.

In another possible implementation manner, fusing the single view AO layers of each view according to the UV parameters to obtain the AO map of the 3D object includes: and traversing each pixel in the UV image layer corresponding to each view angle in the N view angles, and assigning the AO values in the AO image layers at the same pixel positions of the same view angles to the corresponding UV coordinates to obtain an AO map.

In another possible implementation manner, one UV coordinate of a region where the 3D mesh model overlaps at different viewing angles corresponds to a plurality of AO values, and for a pixel of the overlapping region, the AO values in the AO layer at the same pixel position of the same viewing angle are assigned to the corresponding UV coordinates, including: and assigning the calculated values of the plurality of AO values corresponding to one UV coordinate to the corresponding UV coordinate.

In another possible implementation, the neural network includes a deep convolutional neural network trained by training data obtained from a ray tracing rendering equation, so that an output of the trained neural network can achieve a quality effect of ray tracing.

In another possible implementation manner, the performing UV parameterization according to the two-dimensional image of the 3D object with N view angles to obtain UV parameters of the grid model of the 3D object includes: according to the two-dimensional images of the 3D object at the N view angles, 3D modeling is carried out to obtain a 3D grid model of the 3D object; performing UV unfolding on the 3D grid model to a UV parameter plane to obtain a UV unfolding diagram; the UV unfolding diagram comprises UV coordinates of each vertex in the 3D grid model; taking the single-view UV layer as a UV parameter; a single view UV layer includes UV coordinates for each pixel in the view. The three-dimensional network model is obtained through the two-dimensional image, the UV parameters are further obtained, and the obtained single-view AO layers can be fused according to the UV parameters, so that the high-quality AO mapping can be obtained efficiently.

In another possible implementation, the camera parameters may include one or more of the following: camera pose, camera focal length, camera parameters are used to determine a specific viewing angle.

In another possible implementation manner, obtaining a geometric information layer of the 3D object at each of the N views according to the two-dimensional image of the N views includes: modeling according to the two-dimensional images of the N visual angles to obtain a 3D grid model of the 3D object; and acquiring a geometric information layer of the 3D object at each view angle in the N view angles according to the 3D grid model and the N view angles.

In another possible implementation manner, the acquiring the geometric information layer of the 3D object at each of the N views includes: obtaining a geometric information layer of each view angle in N view angles of a 3D object by adopting rasterization or light projection

In another possible implementation, the two-dimensional images of the N perspectives may cover the surface of the 3D object, which improves the accuracy and integrity of rendering.

In another possible implementation manner, the method for obtaining an AO map provided in the present application may further include: the AO-map is superimposed onto a three-dimensional model of the 3D object for display to improve the fidelity and layering of the 3D object presented to the user.

In a second aspect, an apparatus for obtaining an AO map may comprise: the device comprises an acquisition unit, a prediction unit, a UV parameterization unit and a fusion unit. Wherein:

the acquisition unit is used for acquiring the geometric information image layer of the 3D object in each view angle of the N view angles according to the two-dimensional images of the N view angles of the 3D object and the camera parameters of the N view angles. The geometric information layer includes a normal map. N is greater than or equal to 2.

And the prediction unit is used for inputting the geometric information image layer of each view angle into the trained neural network to obtain a single-view AO image layer of each view angle.

And the UV parameterization unit is used for carrying out UV parameterization according to the two-dimensional images of the 3D object at the N view angles and obtaining UV parameters of the grid model of the 3D object.

And the fusion unit is used for fusing the single-view AO layers of each view according to the UV parameters acquired by the UV parameterization unit to obtain the AO mapping of the 3D object.

Through the device for acquiring the AO map, the neural network for acquiring the single-view AO map is trained in advance, the geometric information map layers of different view angles of the 3D object are input into the trained neural network, the single-view AO map is efficiently predicted and output by the neural network, and then the plurality of single-view AO map layers are fused according to the UV parameters of the 3D object grid model to obtain the AO map. The neural network is trained through high-quality training data, so that the output of the neural network can be ensured to be close to the real illumination; in addition, because the neural network can predict with high efficiency, even if the scene complexity is improved, the efficiency of acquiring the AO mapping is great, and therefore, the scheme of the application can improve the efficiency of acquiring the AO mapping with high quality. Furthermore, the process of obtaining the single-view AO image layer does not need to depend on UV parameters, and even if a designer re-expands the UV of the model, the scheme of the application can fuse the obtained single-view AO image layer according to the updated UV parameters, so that the reusability is high, and the efficiency of obtaining the high-quality AO image is further improved.

In another possible implementation manner, the UV parameterization unit is further configured to obtain updated UV parameters of the 3D object. The fusion unit is further used for fusing the single-view AO layers of each view according to the updated UV parameters of the 3D object obtained by the UV parameterization unit to obtain an AO map of the 3D object after updating. According to the updated UV parameters, the acquired single-view AO layers are fused, the AO mapping is not required to be acquired again, the reusability is high, the calculation time and the calculation cost are greatly saved, and the efficiency of acquiring the high-quality AO mapping is further improved.

In another possible implementation, the UV parameterization unit is specifically configured to: according to the two-dimensional images of the 3D object at the N view angles, 3D modeling is carried out to obtain a 3D grid model of the 3D object; performing UV unfolding on the 3D grid model to a UV parameter plane to obtain a UV unfolding diagram; the UV unfolding diagram comprises UV coordinates of each vertex in the 3D grid model; taking the single-view UV layer as a UV parameter; a single view UV layer includes UV coordinates for each pixel in the view. The three-dimensional network model is obtained through the two-dimensional image, the UV parameters are further obtained, and the obtained single-view AO layers can be fused according to the UV parameters, so that the high-quality AO mapping can be obtained efficiently.

In another possible implementation manner, the apparatus may further include a display unit, configured to superimpose the AO map acquired by the fusion unit onto a three-dimensional model of the 3D object for display.

It should be noted that, the apparatus for acquiring an AO map provided in the second aspect is configured to implement the method for acquiring an AO map provided in the first aspect, and the specific implementation of the method may refer to the specific implementation of the first aspect, which is not described herein.

In a third aspect, the present application provides a rendering device, where the rendering device may implement the functions in the method examples described in the first aspect, where the functions may be implemented by hardware, or may be implemented by executing corresponding software by hardware. The hardware or software comprises one or more modules corresponding to the functions. The rendering device may exist in the form of a product of a chip.

In one possible implementation, the rendering device may include a processor and a transport interface. Wherein the transmission interface is used for receiving and transmitting data. The processor is configured to invoke the program instructions stored in the memory to cause the rendering device to perform the functions in the method examples described in the first aspect above.

In a fourth aspect, a computer readable storage medium is provided, comprising instructions which, when run on a computer, cause the computer to perform the method of acquiring an AO map according to any one of the above aspects or any one of the possible implementations.

In a fifth aspect, a computer program product is provided, which when run on a computer causes the computer to perform the method of acquiring an AO map according to any one of the above aspects or any one of the possible implementations.

In a sixth aspect, a chip system is provided, the chip system comprising a processor and possibly a memory, for implementing the functions of the above method. The chip system may be formed of a chip or may include a chip and other discrete devices.

The solutions provided in the fourth to sixth aspects are used to implement the method provided in the first aspect, so that the same benefits as those of the first aspect can be achieved, and no further description is given here.

The various possible implementations of any of the foregoing aspects may be combined without contradiction between the schemes.

Drawings

FIG. 1a illustrates a comparison of an AO-free rendering result with an AO rendering result for the same image;

FIG. 1b illustrates a comparison of another AO-free rendering result with an AO-free rendering result for the same image;

FIG. 1c illustrates another comparison of the AO-free rendering result with the AO rendering result of the same image;

FIG. 1d illustrates a comparison of a further AO-free rendering result with an AO-present rendering result for the same image;

FIG. 2a illustrates a schematic diagram of a ray tracing technique;

FIG. 2b illustrates an AO rendering result;

FIG. 3 illustrates a schematic diagram of a screen space ambient light masking technique;

FIG. 4 illustrates a scheme flow diagram for presenting AO effects by means of AO baking;

FIG. 5 is a schematic diagram of an image processing system according to the present application;

fig. 6 is a schematic structural diagram of a terminal device provided in the present application;

fig. 7 is a schematic software structure of a terminal device provided in the present application;

FIG. 8 is a schematic diagram of a system architecture provided herein;

FIG. 9 is a schematic diagram of a Convolutional Neural Network (CNN) architecture provided herein;

fig. 10 is a schematic diagram of a chip hardware structure provided in the present application;

FIG. 11 is a schematic structural diagram of an apparatus for acquiring an AO map according to the present application;

FIG. 12 is a flowchart of a method for obtaining an AO map provided herein;

fig. 13 is a schematic diagram of a prediction flow of a neural network provided in the present application;

FIG. 14 is a flowchart of an AO mapping obtained by fusion according to the present application;

FIG. 15 is a system architecture diagram of one method of acquiring AO maps provided herein;

FIG. 16 is a flow chart of another method for obtaining an AO map provided herein;

FIG. 17 is a schematic structural diagram of another apparatus for acquiring AO maps provided herein;

FIG. 18 is a schematic structural diagram of yet another apparatus for acquiring AO maps provided herein;

fig. 19 is a schematic structural diagram of a rendering device provided in the present application.

Detailed Description

In the embodiments of the present application, in order to facilitate the clear description of the technical solutions of the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ. The technical features described in the first and second descriptions are not sequential or in order of magnitude.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion that may be readily understood.

In the embodiments of the present application, at least one may also be described as one or more, and a plurality may be two, three, four or more, which is not limited in this application.

In addition, the network architecture and the scenario described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided in the embodiments of the present application, and those skilled in the art can know that, with the evolution of the network architecture and the appearance of a new service scenario, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

Before describing the embodiments of the present application, the terms related to the present application are explained in detail, and will not be explained in detail.

Ambient light occlusion (AO) is an effect to depict the occlusion of surrounding diffuse reflected light when an object and an object intersect or are in close proximity.

And the layer refers to attribute parameters of the 3D object at one view angle.

The geometric information layer refers to the value of geometric information parameters of the 3D object at one viewing angle. For example, the geometry information layer may be a normal map, or a depth map.

The AO layer refers to the illumination intensity value of each position in the image of the 3D object at one view angle.

AO mapping refers to illumination intensity values of various positions in an unfolded graph of a 3D object unfolded into a planar area.

UV parameters, which refer to the coordinates of points on the surface of a 3D object in the parameter domain variables of a parametric surface, are generally expressed by UV letters, such as the parametric surface F (u, v), may also be referred to as UV coordinates, or may also be referred to as texture coordinates.

For clarity and conciseness in the description of the following embodiments, a brief description of the related art will be given first:

in recent years, the display capability of the electronic equipment is more and more powerful, better use experience is brought to users, the fidelity requirement of the users on graphic pictures is higher and higher, the ideal effect is to simulate the illumination effect which is the same as that of real life, the pictures presented to the users are unusual, and the most realistic effect of illumination is to achieve global illumination.

Ray tracing techniques are commonly employed in the industry to achieve global illumination. The ray tracing technology is a technology for simulating light propagation in a scene so as to render a high-quality picture, and can solve the problem of ambient light shielding with high quality. The principle of ray tracing technology is shown in fig. 2a, in which all points p to be colored in a scene randomly project some rays, let them pass through a hemisphere centered on a vertex p, and detect the intersection of these rays with the surrounding grid, if X rays are projected in total, and X of them intersect with the grid (the intersection point is recorded as q), the ambient light shielding rate corresponding to the vertex p is: ao=x/X, AO e 0, 1. When the distance from p to q is sufficiently small, it is determined that the light is shielded, and the larger X is, the more accurate the shielding rate is, the finer the AO effect is, and the larger the calculation amount is, the higher the performance consumption is. Fig. 2b (a) is an AO diagram calculated by using a ray tracing technique, and it can be seen that the AO effect is very fine.

Another technique to simulate global illumination is the screen space ambient light occlusion technique, which can calculate the AO effect of an object in real time, typically represented by screen space based ambient occlusion (SSAO). The screen space ambient light shading technique is based on a raster engine, which uses depth information of the screen space to calculate the ambient light shading rate for each screen pixel. The principle of the screen space ambient light shielding technique is as shown in fig. 3, Y three-dimensional sampling points (points indicated by boxes in fig. 3) are randomly generated in a sphere space with p points as the center and R as the radius, the number Y (points indicated by solid boxes in fig. 3) of the three-dimensional sampling points shielded by a scene is calculated according to depth information, and ao=y/Y and AO e [0,1]. The screen space environment light shielding technology replaces the emitted light rays in the light ray tracing scheme by the sampling points, so that the efficiency is greatly improved, but the accuracy of the calculated AO result is generally poor due to the fact that only information of the local space of the screen is adopted, a certain gap exists between the calculated AO result and the AO effect of the light ray tracing calculation, and the high requirement of a user on visual quality is difficult to meet. Fig. 2b (b) is an AO diagram calculated using the screen space ambient light masking technique, which can be seen to differ significantly from fig. 2b (a).

Another technique for simulating global illumination is to use AO baking to present AO effect, and its scheme flow is shown in fig. 4, where AO data is obtained by ray tracing pre-calculation, and the AO map is directly baked into AO maps according to UV parameters of a 3D mesh model, and the AO maps are overlaid on each pixel for display in coloring calculation. The technology needs ray tracing pre-calculation, and also needs great performance cost and high requirement on hardware. Meanwhile, the AO baking time is increased along with the improvement of the complexity of the scene model, and the offline baking time is very slow to greatly influence the user experience aiming at the high-surface-patch number model obtained by application of 3D cultural relics in reconstruction. And AO baking depends on UV parameters, when a designer re-develops the UV of the model, it is necessary to re-perform time-consuming AO baking calculation, and the reusability is low.

Based on the above, the application provides a scheme for acquiring the AO mapping, which is to pre-train a neural network for acquiring single-view AO layers, input geometric information layers of different views of a 3D object into the trained neural network, efficiently predict and output the single-view AO layers by the neural network, and then fuse a plurality of single-view AO layers according to UV parameters of a 3D object grid model to obtain the AO mapping. The neural network is trained through high-quality training data, so that the output of the neural network can be ensured to be close to the real illumination; in addition, because the neural network can predict with high efficiency, even if the scene complexity is improved, the efficiency of acquiring the AO mapping is great, and therefore, the scheme of the application can improve the efficiency of acquiring the AO mapping with high quality. Furthermore, the process of obtaining the single-view AO image layer does not need to depend on UV parameters, and even if a designer re-expands the UV of the model, the scheme of the application can fuse the obtained single-view AO image layer according to the updated UV parameters, so that the reusability is high, and the efficiency of obtaining the high-quality AO image is further improved.

The method for acquiring the AO map provided by the application can be applied to an image processing system shown in FIG. 5. As shown in fig. 5, the image processing system includes a rendering device 501, a display device 502.

The rendering device 501 is configured to execute the scheme provided by the present application, efficiently acquire a high-quality AO map, and display a 3D object on which the AO map acquired by the rendering device 501 is superimposed by the display device 502.

It should be noted that, the rendering device 501 may directly communicate with the display device 502, or the rendering device 501 may also communicate with the display device 502 through forwarding of other devices, which is not limited in the embodiment of the present application.

For example, the rendering device 501 may be deployed at a cloud, the display device 502 requests the rendering device 501 at the cloud, the scheme provided by the application is executed to efficiently obtain the AO mapping with high quality, then the rendering device 501 provides the three-dimensional model and the AO mapping of the 3D object to the display device 501, and the display device 502 displays the 3D object superimposed with the AO mapping obtained by the rendering device 501. Of course, the rendering device 501 may be deployed at other locations in the network, which is not limited in this embodiment of the present application.

For example, the rendering method provided by the application may have the following application scenarios:

Application scenario 1:3D virtualized merchandise display. When a user browses a virtual commodity on an e-commerce platform of a terminal device such as mobile phone augmented reality (augmented reality, AR)/Virtual Reality (VR), the merchant needs to superimpose an AO map on a model of the virtual commodity in order to provide a realistic commodity preview experience, so as to improve visual stereoscopic impression. When the commodity number is large, the generation of the AO mapping has large workload. Through the scheme provided by the application, the AO mapping can be efficiently and high-quality generated, the time for putting the virtual commodity on the shelf can be effectively shortened, and the purchasing desire of customers is improved. Meanwhile, when the merchant needs to change the appearance of the commodity and re-attach textures on the commodity 3D model, if the original UV mapping relation is changed, compared with the traditional method, the scheme does not need to recalculate AO, and only needs to re-fuse the acquired single-view AO layers according to updated UV parameters to obtain a new AO map, so that unnecessary reworking procedures are reduced, and the efficiency is improved.

Application scenario 2: and 3D cultural relic display. In the field of cultural relic protection, a museum can acquire high-definition images of cultural relics and reconstruct a high-precision three-dimensional model of the cultural relics. In order to restore the original appearance of the cultural relics, the finally displayed digital cultural relic model is often required to have high sense of reality, so that the purposes of propaganda and display are achieved, and therefore, the AO mapping with high resolution and high quality is required to improve the visual effect. According to the scheme provided by the application, the high-quality AO mapping can be generated efficiently, so that browsing experience of spectators to cultural relics is greatly improved.

It should be noted that the application scenario is only an example, and is not limited to the application scenario of the present application scheme.

The display device 502 may be a terminal device, for example. For example, the terminal device may specifically be a terminal device having a display function, such as a large-screen display device, a mobile phone, a notebook computer, a tablet computer, a vehicle-mounted device, a wearable device (such as a smart watch), an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), an artificial intelligence (artificial intelligence) device, and the specific type of the terminal device is not limited in the embodiments of the present application.

In this application, the structure of the terminal device may be as shown in fig. 6. As shown in fig. 6, the terminal device 100 may include

Processor 110, external memory interface 120, internal memory 121, universal serial bus (universal serial bus, USB) interface 130, charge management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headset interface 170D, sensor module 180, keys 190, motor 191, indicator 192, camera 193, display 194, and subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the configuration illustrated in the present embodiment does not constitute a specific limitation on the terminal device 100. In other embodiments, terminal device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. For example, in the present application, the processor 110 may control to turn on the other cameras if the first image satisfies the abnormal condition.

The controller may be a neural center and a command center of the terminal device 100. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (display serial interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing function of terminal device 100. The processor 110 and the display 194 communicate via a DSI interface to implement the display function of the terminal device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the terminal device 100, or may be used to transfer data between the terminal device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other terminal devices, such as AR devices, etc.

It should be understood that the interfacing relationship between the modules illustrated in the present embodiment is only illustrative, and does not constitute a structural limitation of the terminal device 100. In other embodiments of the present application, the terminal device 100 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the terminal device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The terminal device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro-led, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the terminal device 100 may include 1 or Q display screens 194, Q being a positive integer greater than 1.

A series of graphical user interfaces (graphical user interface, GUIs) may be displayed on the display 194 of the terminal device 100, these GUIs being the home screen of the terminal device 100. Generally, the display 194 of the terminal device 100 is fixed in size and only limited controls can be displayed in the display 194 of the terminal device 100. A control is a GUI element that is a software component contained within an application program that controls all data processed by the application program and interactive operations on that data, and a user can interact with the control by direct manipulation (direct manipulation) to read or edit information about the application program. In general, controls may include visual interface elements such as icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, widgets, and the like.

The terminal device 100 may implement a photographing function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, terminal device 100 may include 1 or W cameras 193, W being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the terminal device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The terminal device 100 may support one or more video codecs. In this way, the terminal device 100 can play or record video in various encoding formats, for example: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the terminal device 100 may be implemented by the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to realize expansion of the memory capability of the terminal device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the terminal device 100 and data processing by executing instructions stored in the internal memory 121. For example, in the present embodiment, the processor 110 may acquire the pose of the terminal device 100 by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data (such as audio data, phonebook, etc.) created during use of the terminal device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the terminal device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The terminal device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The terminal device 100 can listen to music or to handsfree talk through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When the terminal device 100 receives a call or voice message, it is possible to receive voice by approaching the receiver 170B to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The terminal device 100 may be provided with at least one microphone 170C. In other embodiments, the terminal device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the terminal device 100 may be further provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify the source of sound, implement directional recording functions, etc.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The terminal device 100 determines the intensity of the pressure according to the change of the capacitance. When a touch operation is applied to the display 194, the terminal device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The terminal device 100 may also calculate the position of the touch from the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: and executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

The gyro sensor 180B may be used to determine a motion gesture of the terminal device 100. In some embodiments, the angular velocity of the terminal device 100 about three axes (i.e., x, y, and z axes) may be determined by the gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180B detects the angle of the shake of the terminal device 100, calculates the distance to be compensated by the lens module according to the angle, and allows the lens to counteract the shake of the terminal device 100 by the reverse motion, thereby realizing anti-shake. The gyro sensor 180B may also be used for navigating, somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal device 100 calculates altitude from barometric pressure values measured by the barometric pressure sensor 180C, aiding in positioning and navigation.

The magnetic sensor 180D includes a hall sensor. The terminal device 100 can detect the opening and closing of the flip cover using the magnetic sensor 180D. In some embodiments, when the terminal device 100 is a folder, the terminal device 100 may detect opening and closing of the folder according to the magnetic sensor 180D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.

The acceleration sensor 180E can detect the magnitude of acceleration of the terminal device 100 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the terminal device 100 is stationary. The method can also be used for identifying the gesture of the terminal equipment, and is applied to the applications such as horizontal and vertical screen switching, pedometers and the like.

A distance sensor 180F for measuring a distance. The terminal device 100 may measure the distance by infrared or laser. In some embodiments, the terminal device 100 may range using the distance sensor 180F to achieve fast focusing.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal device 100 emits infrared light outward through the light emitting diode. The terminal device 100 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object in the vicinity of the terminal device 100. When insufficient reflected light is detected, the terminal device 100 may determine that there is no object in the vicinity of the terminal device 100. The terminal device 100 can detect that the user holds the terminal device 100 close to the ear to talk by using the proximity light sensor 180G, so as to automatically extinguish the screen for the purpose of saving power. The proximity light sensor 180G may also be used in holster mode, pocket mode to automatically unlock and lock the screen.

The ambient light sensor 180L is used to sense ambient light level. The terminal device 100 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the terminal device 100 is in a pocket to prevent false touches.

The fingerprint sensor 180H is used to collect a fingerprint. The terminal device 100 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The temperature sensor 180J is for detecting temperature. In some embodiments, the terminal device 100 performs a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the terminal device 100 performs a reduction in the performance of a processor located near the temperature sensor 180J in order to reduce power consumption to implement thermal protection. In other embodiments, when the temperature is below another threshold, the terminal device 100 heats the battery 142 to avoid the low temperature causing the terminal device 100 to shut down abnormally. In other embodiments, when the temperature is below a further threshold, the terminal device 100 performs boosting of the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperatures.

The touch sensor 180K, also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the terminal device 100 at a different location than the display 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, bone conduction sensor 180M may acquire a vibration signal of a human vocal tract vibrating bone pieces. The bone conduction sensor 180M may also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, in combination with an osteoinductive headset. The audio module 170 may analyze the voice signal based on the vibration signal of the sound portion vibration bone block obtained by the bone conduction sensor 180M, so as to implement a voice function. The application processor may analyze the heart rate information based on the blood pressure beat signal acquired by the bone conduction sensor 180M, so as to implement a heart rate detection function.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The terminal device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the terminal device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects by touching different areas of the display screen 194. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

In addition, an operating system is run on the components. Such as the iOS operating system developed by apple corporation, the Android open source operating system developed by google corporation, the Windows operating system developed by microsoft corporation, etc. An operating application may be installed on the operating system.

The operating system of the terminal device 100 may employ a layered architecture, an event driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. In this embodiment, taking an Android system with a layered architecture as an example, a software structure of the terminal device 100 is illustrated.

Fig. 7 is a software configuration block diagram of the terminal device 100 of the embodiment of the present application.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively.

The application layer may include a series of application packages. As shown in fig. 7, the application package may include applications for cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc. For example, when taking a photograph, the camera application may access a camera interface management service provided by the application framework layer.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions. As shown in fig. 7, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like. For example, in the embodiment of the present application, when browsing shopping web pages, the application framework layer may provide APIs related to web page browsing functions for the application layer, and provide interface management services for the application layer, so as to implement web page browsing functions.

The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The telephony manager is used to provide the communication functions of the terminal device 100. Such as the management of call status (including on, hung-up, etc.).

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the terminal equipment vibrates, and an indicator light blinks.

Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio video encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 4, h.264, mp3, advanced audio coding (advanced audio coding, AAC), adaptive multi-rate (adaptive multi rate, AMR), joint photographic experts group (joint photo graphic experts group, JPEG), portable network graphics format (portable network graphic format, PNG), and the like.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

A two-dimensional (2D) graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

It should be noted that the embodiments of the present application are described in the following

The system is described by way of example, but its basic principle is equally applicable to the +.>

Or->

And the terminal equipment of an operating system.

The method for acquiring the AO flitch can be applied to a scene for presenting a 3D object. For example, the method for acquiring the AO map in the embodiment of the present application can be applied to a scene of online displaying a 3D object (online shopping or online displaying a cultural relic, etc.), and in the following, with reference to fig. 1 and the scene of online displaying a 3D object, a workflow of terminal device 100 software and hardware is illustrated.

Illustratively, the touch sensor 180K receives a touch operation and reports the touch operation to the processor 110, so that the processor responds to the touch operation to display a 3D object corresponding to the touch operation on the display screen 194. For example, when the touch sensor 180K receives a touch operation on an icon of an object in the online shopping interface, the touch operation on the 3D object is reported to the processor 110, so that the processor 110 obtains an AO map of the object in response to the touch operation, and displays a 3D model of the object superimposed with the AO map on the display screen 194.

For example, in the process of displaying 3D objects, such as online shopping, online display of cultural relics in a museum, etc., after capturing a multi-view two-dimensional image of a 3D object, a server (provided with a rendering device) performs 3D modeling to obtain a 3D grid model of the 3D object, and superimposes an AO mapping, the 3D grid model is presented to a user through a terminal device 100, so that the user sees the 3D object with high fidelity to be presented online.

The method provided in the present application is described below from the model training side and the model application side:

the method for acquiring the AO mapping, which is adopted by the embodiment of the application and used for acquiring the neural network of the single-view AO layer, relates to the processing of computer vision, and can be particularly applied to data processing methods such as data training, machine learning, deep learning and the like, and performs symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on training data (such as the single-view geometric information layer and the corresponding single-view AO layer in the application) to finally obtain the trained neural network for acquiring the single-view AO layer; in addition, the method for obtaining the AO map provided in the embodiment of the present application may apply the trained neural network for obtaining the single-view AO layer, and input the input data (e.g., the single-view geometric information layer in the present application) into the trained neural network for obtaining the single-view AO layer, so as to obtain the output data (e.g., the single-view AO layer in the present application). It should be noted that, the training method for acquiring the neural network of the single-view AO layer and the method for acquiring the AO map provided in the embodiments of the present application are inventions based on the same concept, and may be understood as two parts in a system or two stages of an overall flow: such as a model training phase and a model application phase.

Since the embodiments of the present application relate to a large number of applications of neural networks, for ease of understanding, related terms and related concepts of the neural networks related to the embodiments of the present application will be described below.

(1) Neural Networks (NN)

The neural network is a machine learning model, and is a machine learning technology which simulates the neural network of a human brain so as to be capable of realizing artificial intelligence. The input and output of the neural network can be configured according to actual requirements, and the neural network is trained through the sample data so that errors of the actual output of the neural network corresponding to the sample data are minimized. The neural network may be composed of neural units, which may be referred to as x _s And an arithmetic unit whose intercept 1 is an input, the output of the arithmetic unit may be:

wherein s=1, 2, … … n, n is a natural number greater than 1, W _s Is x _s B is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by joining together a number of the above-described single neural units, i.e., the output of one neural unit may be the input of another. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

(2) Deep neural network

Deep neural networks (deep neural network, DNN), also known as multi-layer neural networks, can be understood as neural networks having many hidden layers, many of which are not particularly metrics. From DNNs, which are divided by the location of the different layers, the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein (1)>

Is an input vector, +.>

Is the output vector, b is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for the input vector +.>

The output vector is obtained by such simple operation>

Since the number of DNN layers is large, the number of coefficients W and offset vectors b is large. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in DNN of one three layers, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as +. >

The superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4. The summary is: the coefficients from the kth neuron of the L-1 th layer to the jth neuron of the L-1 th layer are defined as +.>

It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(3) Convolutional neural network

The convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer. The feature extractor can be seen as a filter and the convolution process can be seen as a convolution with an input image or convolution feature plane (feature map) using a trainable filter. The convolution layer refers to a neuron layer in the convolution neural network, which performs convolution processing on an input signal. In the convolutional layer of the convolutional neural network, one neuron may be connected with only a part of adjacent layer neurons. A convolutional layer typically contains a number of feature planes, each of which may be composed of a number of neural elements arranged in a rectangular pattern. Neural elements of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights can be understood as the way image information is extracted is independent of location. The underlying principle in this is: the statistics of a certain part of the image are the same as other parts. I.e. meaning that the image information learned in one part can also be used in another part. The same learned image information can be used for all locations on the image. In the same convolution layer, a plurality of convolution kernels may be used to extract different image information, and in general, the greater the number of convolution kernels, the more abundant the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix with random size, and reasonable weight can be obtained through learning in the training process of the convolution neural network. In addition, the direct benefit of sharing weights is to reduce the connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) A recurrent neural network (recurrent neural networks, RNN) is used to process the sequence data. In the traditional neural network model, from an input layer to an implicit layer to an output layer, the layers are fully connected, and no connection exists for each node between each layer. Although this common neural network solves many problems, it still has no weakness for many problems. For example, you want to predict what the next word of a sentence is, it is generally necessary to use the previous word, because the previous and next words in a sentence are not independent. RNN is called a recurrent neural network in the sense that a sequence's current output is related to the previous output. The specific expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more and are connected, and the input of the hidden layers comprises not only the output of the input layer but also the output of the hidden layer at the last moment. In theory, RNNs are able to process sequence data of any length. Training for RNNs is the same as training for traditional CNNs or DNNs. Error back propagation algorithms are also used, but with a few differences: that is, if the RNN is network extended, parameters therein, such as W, are shared; this is not the case with conventional neural networks such as those described above. And in using a gradient descent algorithm, the output of each step depends not only on the network of the current step, but also on the state of the previous steps of the network. This learning algorithm is referred to as the time-based back propagation algorithm Back Propagation Through Time (BPTT).

Why is the convolutional neural network already present, the neural network is also looped? The reason is simple, and in convolutional neural networks, one precondition assumption is that: the elements are independent of each other, and the input and output are independent of each other, such as cats and dogs. However, in the real world, many elements are interconnected, such as the stock changes over time, and further such as one says: i like travel, where the most favored place is Yunnan, and later have the opportunity to go. Here, the filling should be known to humans as filling "yunnan". Because humans will infer from the context, but how to have the machine do this? RNNs have thus been developed. RNNs aim to give robots the ability to memorize as a robot. Thus, the output of the RNN needs to rely on current input information and historical memory information.

(6) Loss function

In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually expected target value according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be predicted to be lower, and the adjustment is continued until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.

The system architecture provided in the embodiments of the present application is described below.

Referring to fig. 8, an embodiment of the present invention provides a system architecture 800. As shown in the system architecture 800, the data acquisition device 860 is configured to acquire training data, where the training data in this embodiment of the present application includes: a single view geometry information layer and a corresponding single view AO layer thereof; and stores the training data in database 830, training device 820 trains to obtain target model/rule 801 based on the training data maintained in database 830. The target model/rule 801 may be a neural network for acquiring a single-view AO layer described in the embodiment of the present application, that is, a single-view geometric information layer of a 3D object is input into the target model/rule 801, so as to obtain the single-view AO layer of the 3D object at the view. Wherein, the single-view AO layer in the training data can be calculated by adopting a rendering equation of a ray tracing technology, so that the single-view AO layer output by the target model/rule 801 can reach the visual effect of the approach of the AO layer obtained by the ray tracing rendering equation.

It should be noted that, in practical applications, the training data maintained in the database 830 is not necessarily all acquired by the data acquisition device 860, but may be received from other devices. It should be further noted that the training device 820 is not necessarily completely based on the training data maintained by the database 830 to perform training of the target model/rule 801, and it is also possible to obtain the training data from the cloud or other places to perform model training, and the above description should not be taken as limitation of the embodiments of the present application.

It should be further noted that, a specific scheme of training the training device 820 to obtain the target model/rule 801 may be selected according to the requirement, and this training process is not described in detail in the embodiment of the present application.

The target model/rule 801 obtained by training according to the training device 820 may be applied to different systems or devices, such as the execution device 810 shown in fig. 8, where the execution device 810 may be a server or a cloud, or may also be a terminal device, such as a mobile phone terminal, a tablet computer, a notebook computer, an AR/VR, a vehicle-mounted terminal, and the like. In fig. 8, the execution device 810 is configured with an I/O interface 812 for data interaction with external devices, and a user may input data to the I/O interface 812 through the client device 840, where the input data may include in embodiments of the present application: a single view geometry information layer, or a single view two-dimensional image.

In the process of performing related processing such as calculation by the calculation module 811 of the execution device 810, the execution device 810 may call data, codes, etc. in the data storage system 850 for corresponding processing, or may store data, instructions, etc. obtained by corresponding processing in the data storage system 850.

Finally, the I/O interface 812 returns the processing result, such as the obtained AO layer or AO map with a single view, to the client device 840, so that the processing result is provided to the user to fuse the AO layers according to the single view to obtain the AO map for use, or the processing result is provided to the user to use the AO map, thereby realizing efficient obtaining of the AO map.

For example, the executing device 810 may input the single view geometry information layer of N views of the 3D object into the target model/rule 801 to obtain single view AO layer diagrams of N views of the 3D object, fuse the single view AO layer diagrams of N views according to the UV parameters of the 3D object to obtain the AO map of the 3D object, store the obtained AO map in the data storage system 850, and finally, the I/O interface 812 returns the AO map of the 3D object to the client device 840 to achieve efficient obtaining of the AO map. The client device 840 may be a server that interacts with a terminal device, or the client device 840 may be a terminal device for display.

The single view geometry information layer of N views of the 3D object may be input by other devices (e.g., the client device 840) through the I/O interface 812, or may be a two-dimensional image of N views of the 3D object input by other devices (e.g., the client device 840) through the I/O interface 812, and the executing device 810 processes the two-dimensional image into the geometry information layer of N views, or may be acquired in other manners, which is not specifically limited in the embodiment of the present application for the implementation of the executing device 810 to acquire the single view geometry information layer of N views of the 3D object.

It should be noted that the training device 820 may generate, based on different training data, a corresponding target model/rule 801 for different targets or different tasks, where the corresponding target model/rule 801 may be used to achieve the targets or to perform the tasks, thereby providing the user with the desired results.

In the case shown in FIG. 8, the user may manually give input data, which may be manipulated through an interface provided by I/O interface 812. In another case, the client device 840 may automatically send input data to the I/O interface 812, and if the client device 840 is required to automatically send input data requiring authorization from the user, the user may set the corresponding permissions in the client device 840. The user may view the results output by execution device 810 at client device 840. The client device 840 may also be used as a data collection terminal to collect input data of the input I/O interface 812 and output results of the output I/O interface 812 as new sample data as shown in fig. 8, and store the new sample data in the database 830. Of course, the input data input to the I/O interface 812 and the output result output from the I/O interface 812 as shown in fig. 8 may be stored in the database 830 as new sample data directly by the I/O interface 812 instead of being collected by the client device 840.

It should be noted that fig. 8 is only a schematic diagram of a system architecture provided in the embodiments of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawings is not limited in any way, for example, in fig. 8, the data storage system 850 is an external memory with respect to the execution device 810, and in other cases, the data storage system 850 may be disposed in the execution device 810.

The method and apparatus provided in the embodiments of the present application may also be used to augment a training database, where, as shown in fig. 8, the I/O interface 812 of the execution device 810 may send the data processed by the execution device (such as the 3D object single-view AO layer) and the 3D object AO layer that is subjected to ray tracing calculation together as a training data pair to the database 830, so that the training data maintained by the database 830 is richer, and thus richer training data is provided for the training work of the training device 820.

As shown in fig. 8, the target model/rule 801 is trained according to the training device 820, where the target model/rule 801 may be a neural network for acquiring a single view AO layer in the embodiment of the present application. The neural network for acquiring the single view AO layer provided in the embodiments of the present application may be a convolutional neural network or others.

As described in the foregoing description of the basic concept, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning architecture, where the deep learning architecture refers to learning at multiple levels at different abstraction levels through machine learning algorithms. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to an image input thereto.

As shown in fig. 9, convolutional Neural Network (CNN) 900 may include an input layer 910, a convolutional layer/pooling layer 920 (where the pooling layer is optional), and a neural network layer 930.

Convolution layer/pooling layer 920:

convolution layer:

the convolution/pooling layer 920 shown in fig. 9 may include layers as examples 921-926, for example: in one implementation, layer 921 is a convolutional layer, layer 922 is a pooling layer, layer 923 is a convolutional layer, layer 924 is a pooling layer, layer 925 is a convolutional layer, and layer 926 is a pooling layer; in another

implementation

921, 922 are convolutional layers, 923 is a pooling layer, 924, 925 are convolutional layers, 926 is a pooling layer. I.e. the output of the convolution layer may be used as input to a subsequent pooling layer or as input to another convolution layer to continue the convolution operation.

The internal principles of operation of a convolution layer 921 will be described below using the example of a convolution layer.

The convolution layer 921 may include a number of convolution operators, also known as kernels, which act in image processing as a filter to extract specific information from the input image matrix, which may be a weight matrix in nature, which is typically predefined, and which is typically processed on the input image in a horizontal direction, pixel by pixel (or two pixels by two pixels … …, depending on the value of the step size stride), to accomplish the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix produces a convolved output of a single depth dimension, but in most cases does not use a single weight matrix, but instead applies multiple weight matrices of the same size (row by column), i.e., multiple homography matrices. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by the "multiple" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix is used to extract image edge information, another weight matrix is used to extract a particular color of the image, yet another weight matrix is used to blur unwanted noise in the image, etc. The plurality of weight matrixes have the same size (row and column), the feature images extracted by the plurality of weight matrixes with the same size have the same size, and the extracted feature images with the same size are combined to form the output of convolution operation.

The weight values in the weight matrices are required to be obtained through a large amount of training in practical application, and each weight matrix formed by the weight values obtained through training can be used for extracting information from an input image, so that the convolutional neural network 600 can perform correct prediction.

When convolutional neural network 900 has multiple convolutional layers, the initial convolutional layer (e.g., 921) tends to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network 900 increases, features extracted by the later convolutional layers (e.g., 926) become more complex, such as features of high level semantics, which are more suitable for the problem to be solved.

Pooling layer:

since it is often desirable to reduce the number of training parameters, the convolutional layers often require periodic introduction of pooling layers, each of 921-926 layers as illustrated by 920 in fig. 9, which may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. The only purpose of the pooling layer during image processing is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator may calculate pixel values in the image over a particular range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

Neural network layer 930:

after processing by convolutional layer/pooling layer 920, convolutional neural network 900 is not yet sufficient to output the desired output information. Because, as previously described, the convolution/pooling layer 920 will only extract features and reduce the parameters imposed by the input image. However, to generate the final output information (the required class information or other relevant information), convolutional neural network 900 needs to utilize neural network layer 930 to generate the output of one or a set of the required number of classes. Thus, multiple hidden layers (931, 932 through 93n shown in FIG. 9) may be included in the neural network layer 930, and the output layer 940, where parameters included in the multiple hidden layers may be pre-trained based on training data associated with a particular task type, e.g., such as image recognition, image classification, image super-resolution reconstruction, etc. … …

After the hidden layers in the neural network layer 930, that is, the final layer of the overall convolutional neural network 900 is the output layer 940, the output layer 940 has a class-cross entropy-like loss function, specifically for calculating the prediction error, and once the forward propagation of the overall convolutional neural network 900 (e.g., propagation from 910 to 940 in fig. 9 is forward propagation) is completed, the backward propagation (e.g., propagation from 940 to 910 in fig. 9 is backward propagation) starts to update the weight values and the bias of the aforementioned layers to reduce the loss of the convolutional neural network 900 and the error between the result output by the convolutional neural network 900 through the output layer and the ideal result.

It should be noted that the convolutional neural network 900 shown in fig. 9 is only an example of a convolutional neural network, and the convolutional neural network may also exist in the form of other network models in a specific application.

The following describes a chip hardware structure provided in the embodiments of the present application.

Fig. 10 is a schematic diagram of a chip hardware architecture according to an embodiment of the present invention, where the chip includes a neural Network Processor (NPU) 1000. The chip may be provided in an execution device 810 as shown in fig. 8 to perform the calculation work of the calculation module 811. The chip may also be provided in a training device 820 as shown in fig. 8 to complete the training work of the training device 820 and output the target model/rule 801. The algorithm of each layer in the convolutional neural network as shown in fig. 9 can be implemented in the chip as shown in fig. 10.

As shown in fig. 10, the NPU 70 is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (Host CPU) and tasks are allocated by the Host CPU. The NPU has a core part of an arithmetic circuit 1003, and the controller 1004 controls the arithmetic circuit 1003 to extract data in a memory (weight memory or input memory) and perform arithmetic.

In some implementations, the arithmetic circuit 1003 includes a plurality of processing units (PEs) internally. In some implementations, the operational circuit 1003 is a two-dimensional systolic array. The arithmetic circuit 1003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1003 is a general purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit 1003 fetches the data corresponding to the matrix B from the weight memory 1002, and buffers each PE in the arithmetic circuit. The arithmetic circuit 1003 performs matrix operation on the matrix a data and the matrix B data from the input memory 1001, and stores the partial result or the final result of the matrix obtained in the accumulator 1008 (accumulator).

The vector calculation unit 1007 may further process the output of the operation circuit 1003 such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 1007 may be used for network calculations of a non-convolutional/non-fully-connected layer (fully connected layers, FC) layer in a neural network, such as Pooling (Pooling), batch normalization (Batch Normalization), local response normalization (Local Response Normalization), and the like.

In some implementations, the vector calculation unit 1007 can store the vector of processed outputs to the unified buffer 1006. For example, the vector calculation unit 1007 may apply a nonlinear function to an output of the operation circuit 1003, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 1007 generates a normalized value, a combined value, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 1003, for example for use in subsequent layers in a neural network.

For example, the algorithm of each layer in the convolutional neural network as shown in fig. 9 may be performed by 1003 or 1007. The algorithm of calculation module 811 and training device 820 in fig. 8 may be performed by 1003 or 1007.

The unified memory 1006 is used for storing input data and output data.

The weight data is directly transferred to the input memory 1001 and/or the unified memory 1006 by the memory cell access controller (direct memory access controller, DMAC) 1005, the weight data in the external memory is stored in the weight memory 1002, and the data in the unified memory 1006 is stored in the external memory.

A bus interface unit (bus interface unit, BIU) 1010 for interfacing between the main CPU, DMAC and finger memory 1009 via a bus.

An instruction fetch memory (instruction fetch buffer) 1009 connected to the controller 1004 stores instructions for use by the controller 1004.

The controller 1004 is configured to invoke an instruction cached in the instruction memory 1009 to control a working process of the operation accelerator.

The data here may be illustratively data, may be input or output data of each layer in the convolutional neural network shown in fig. 9, or may be input or output data of the calculation module 811 and the training device 820 in fig. 8.

Typically, the unified memory 1006, the input memory 1001, the weight memory 1002, and the finger memory 1009 are On-Chip (On-Chip) memories, and the external memory is a memory external to the NPU, which may be a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), a high bandwidth memory (high bandwidth memory, HBM), or other readable and writable memory.

Alternatively, the program algorithm in fig. 8 and 9 is implemented by the cooperation of the main CPU and the NPU.

The technical solutions in the present application will be described below with reference to the accompanying drawings.

In one aspect, an embodiment of the present application provides an apparatus for acquiring an AO map, which is configured to execute a method for acquiring an AO map provided by the present application.

Fig. 11 illustrates a block diagram of an apparatus for acquiring AO maps according to an embodiment of the present application. As shown in fig. 11, the apparatus 110 for acquiring an AO map may include a processor 1101, a memory 1102, and a transceiver 1103.

The following describes the components of the apparatus 110 for acquiring AO maps in detail with reference to fig. 11:

the memory 1102 may be a volatile memory (RAM), such as a random-access memory (RAM); or a nonvolatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or a combination of the above types of memories for storing application code, configuration files, data information, or other content in which the methods of the present application may be implemented. In other possible scenarios, the memory 1102 may also be deployed in other devices independent of the apparatus 110 that obtains the AO map.

The transceiver 1103 is configured to obtain information interaction between the AO map device 110 and other devices.

The processor 1101 may be a control center of the apparatus 110 for acquiring AO maps. For example, the processor 1101 may be a central processing unit (central processing unit, CPU), may be an application specific integrated circuit (application specific integrated circuit, ASIC), or may be one or more integrated circuits configured to implement embodiments of the present application, such as: one or more microprocessors (digital signal processor, DSPs), or one or more field programmable gate arrays (field programmable gate array, FPGAs).

The processor 1101 performs the following functions by executing or executing software programs and/or modules stored in the memory 1102:

acquiring a geometric information layer of each view angle of the 3D object in N view angles according to two-dimensional images of the 3D object in N view angles and camera parameters of the N view angles, wherein the geometric information layer comprises a normal map; n is greater than or equal to 2; inputting the geometric information image layer of each view angle into a trained neural network to obtain a single view angle AO image layer of each view angle; performing UV parameterization according to the two-dimensional images of the 3D object at N visual angles to obtain UV parameters of a grid model of the 3D object; and fusing the single-view AO layers of each view according to the UV parameters to obtain the AO mapping of the 3D object.

In another aspect, embodiments of the present application provide a method of obtaining an AO map, which is performed by a rendering device, which may be the execution device 810 shown in fig. 8 or otherwise.

Alternatively, the method for acquiring the AO map provided in the embodiment of the present application may be processed by a CPU, or may be processed by both the CPU and the GPU, or may not use the GPU, and other processors suitable for neural network computation may be used, which is not limited in the present application.

As shown in fig. 12, the method for obtaining an AO map provided in the embodiment of the present application may include:

and S1201, the rendering device acquires a geometric information layer of the 3D object in each view angle of the N view angles according to the two-dimensional images of the 3D object in the N view angles and the camera parameters of the N view angles.

Wherein N is greater than or equal to 2, the two-dimensional images of the N perspectives may cover a surface of the 3D object. In the embodiment of the present application, the value of N is not limited, and the two-dimensional images of N viewing angles may cover the surface of the 3D object.

In particular, the geometric information layer may include a normal map.

In one possible implementation, the geometric information layer is a normal map. The normal line graph is a gray level image with numerical values of normal line attributes of each pixel point in the single view image arranged according to pixel positions.

Optionally, the geometric information layer may further include a normal map and a depth map. The depth map is a gray scale image in which the numerical value of the depth attribute of each pixel point in the single view image is arranged according to the pixel position.

In another possible implementation, the geometry information layer is a normal map and a depth map.

In contrast, the quality of the resulting AO map is higher with the normal map and the depth map as the geometry information map layers than with the normal map as the geometry information map layers.

Wherein the camera parameters may be used to determine the viewing angle at which the two-dimensional image is taken. By way of example, the camera parameters may include one or more of the following: camera pose, camera focal length. Of course, the content of the camera parameters may be configured according to actual requirements, and the specific content of the camera parameters is not limited in the embodiments of the present application.

Specifically, S1201 may be implemented as: the rendering device models and acquires a 3D grid model of the 3D object according to the two-dimensional images of the N visual angles; and acquiring a geometric information layer of the 3D object in each view angle of the N view angles according to the 3D grid model and the camera parameters of the N view angles.

Wherein, the modeling obtains a 3D grid model of the 3D object in a 3D reconstruction pipeline mode. The rendering device can rapidly acquire the geometric information layers of the 3D object in each view angle of the N view angles in a light weight mode such as rasterization or light ray projection.

S1202, the rendering device inputs the geometric information layers of each view angle into the trained neural network to obtain single-view AO layers of each view angle.

The trained neural network is used for inputting a geometric information layer with a single view angle and outputting an AO layer with the single view angle. The training of the neural network has been described in detail in the foregoing, and is not repeated here.

By way of example, fig. 13 illustrates a predictive flow chart for a neural network.

In one possible implementation, the trained neural network may include a deep convolutional neural network trained from training data obtained from ray trace rendering equations.

Wherein, the expression of the ray tracing rendering equation may be:

L _o (p,ω _o )＝L _e (p,ω _o )+∫ _Ω+ L _i (p,ω _i )f _r (p,ω _i ,ω _o )(n·ω _i )dω _i

in another possible implementation, the trained neural network may be trained from a vast amount of actual sampled data. For example, a three-dimensional grid model can be obtained by modeling according to the sampled two-dimensional image, so as to obtain a single-view geometric information layer, and meanwhile, sampling to obtain an AO map of a corresponding view, and training the neural network by using the obtained single-view geometric information layer and the AO map of the corresponding view as training data.

Illustratively, the output of the trained neural network may achieve a quality similar to the AO map calculated by the ray tracing rendering equation.

In one possible implementation manner, the difference between AO values corresponding to the same position between the single view AO image layers of different views of the output of the trained neural network is less than or equal to the first preset value.

The corresponding same position between the single-view AO layers of different views refers to the position of the same point on the surface of the 3D object in the single-view AO layers of different views.

For example, the first preset value may be a threshold value for which the AO values are considered to be the same after the calculation error is taken into consideration.

In another possible implementation manner, the difference between the single-view AO layer of the output of the trained neural network and the AO value at the same position calculated by the ray tracing rendering equation is less than or equal to a second preset value.

The same position calculated by the single-view AO layer and the ray tracing rendering equation refers to the position of the same point on the surface of the 3D object in the single-view AO layer and the position of the same point in the AO layer calculated by the ray tracing rendering equation.

For example, the second preset value may be a threshold value for which the AO value is considered to be the same after the calculation error is taken into consideration.

In a possible implementation manner, the rendering device in S1202 may sequentially input the geometric information layers of N views into the trained neural network, and sequentially output the single-view AO layers of the N views from the trained neural network.

In another possible implementation, the rendering device in S1202 may input the geometric information layers of N views into the trained neural network at the same time, and output the single view AO layers of the N views from the trained neural network at the same time.

And S1203, performing UV parameterization by the rendering device according to the two-dimensional images of the 3D object at the N view angles, and acquiring UV parameters of the grid model of the 3D object.

Specifically, S1203 may be specifically implemented as: the rendering device performs 3D modeling according to the two-dimensional images of the 3D object at the N view angles to obtain a 3D grid model of the 3D object; performing UV unfolding on the 3D grid model to a UV parameter plane to obtain a UV unfolding diagram; the UV unfolding diagram comprises UV coordinates of each vertex in the 3D grid model; finally, taking the single-view UV layer as a UV parameter; a single view UV layer includes UV coordinates for each pixel in the view.

Wherein the UV parameters of the 3D mesh model comprise single view UV map layers of N views.

The UV expansion refers to mapping 3d mesh model (mesh) vertices onto a UV map, where each vertex corresponds to a UV coordinate, which may also be referred to as UV expansion.

Illustratively, for simple 3D mesh, UV can be developed by a 3D editor tool in a flow. Of course, the implementation of UV spreading is not particularly limited in the embodiments of the present application.

For example, for a more complex 3D mesh model, UV spreading can be performed according to the following procedure: model cutting is carried out on a given 3D mesh, so that the model can be paved on a plane like a paper box; tiling each connected area of the model after clipping on a plane; the communication areas are arranged in a close manner without overlapping; and storing the UV mapping as mesh vertex data, and outputting a new 3d mesh. Thus, each vertex of the 3D mesh has a UV coordinate, which can be used as a UV parameter of the 3D mesh.

It should be noted that, the foregoing UV-spreading implementations are all described by way of example, and the embodiments of the present application do not specifically limit the UV-spreading process. A 3D mesh can have a variety of UV spreading modes, and when the appearance of the object changes, UV needs to be re-spread.

And S1204, the rendering equipment fuses the single-view AO layers of each view according to the acquired UV parameters to obtain the AO mapping of the 3D object.

The UV parameters comprise single-view UV image layers of each view angle of N view angles, and one UV image layer comprises UV coordinates of each pixel of the surface of the 3D object under the view angle; the single view angle AO image layer comprises the AO value and the position coordinate of each pixel on the surface of the 3D object under the view angle, so that the same pixel can be determined according to the UV coordinate and the position coordinate, and then the single view angle UV image layer and the single view angle AO image are fused, and finally the AO image is obtained.

Illustratively, S1204 may be implemented as: the rendering device traverses each pixel in the UV image layer of each view angle in N view angles, and the AO values in the AO image layers of the same pixel positions of the same view angle are assigned to the corresponding UV coordinates to obtain an AO map.

Further optionally, for an area where the 3D mesh model overlaps under different view angles, there may be a plurality of AO values corresponding to one UV coordinate, and for a pixel in the overlapping area, assigning AO values in AO layers at the same pixel position of the same view angle to the corresponding UV coordinate may be specifically implemented as: and assigning the calculated values of the plurality of AO values corresponding to one UV coordinate to the corresponding UV coordinate.

The calculated value may be an average value, a maximum value, a minimum value, or a weighted average value. The embodiment of the present application is not limited to the specific content of the calculated value.

By way of example, FIG. 14 illustrates a flow of fusion acquisition of AO maps. As shown in fig. 14, the UV layer and the AO layer at the same viewing angle are subjected to UV-AO mapping, and then fused to obtain an AO map.

Illustratively, FIG. 15 illustrates a system architecture diagram of the method of acquiring AO maps provided herein. As shown in fig. 15, two-dimensional images of N view angles are acquired by a camera, and then 3D reconstruction is performed to obtain a 3D mesh model. And obtaining a geometric information layer of each view according to the 3D grid model and the camera parameters of the N views. The geometric information layers are input into a neural network, and the neural network outputs AO layers of all view angles. And fusing the AO layers of each view angle according to the UV parameters obtained by the UV parameterization of the 3D grid model to obtain the AO map.

The application provides a method for acquiring an AO map, which is characterized in that a neural network for acquiring single-view AO layers is trained in advance, geometric information layers of different views of a 3D object are input into the trained neural network, the single-view AO layers are efficiently predicted and output by the neural network, and then a plurality of single-view AO layers are fused according to UV parameters of a 3D object grid model to obtain the AO map. The neural network is trained through high-quality training data, so that the output of the neural network can be ensured to be close to the real illumination; in addition, because the neural network can predict with high efficiency, even if the scene complexity is improved, the efficiency of acquiring the AO mapping is great, and therefore, the scheme of the application can improve the efficiency of acquiring the AO mapping with high quality. Furthermore, the process of obtaining the single-view AO image layer does not need to depend on UV parameters, and even if a designer re-expands the UV of the model, the scheme of the application can fuse the obtained single-view AO image layer according to the updated UV parameters, so that the reusability is high, and the efficiency of obtaining the high-quality AO image is further improved.

Further, when the appearance of the object is changed, UV needs to be re-developed, the UV parameters will be changed, and after the UV parameters are changed, as shown in fig. 16, the method for obtaining the AO map according to the embodiment of the present application may further include S1205.

And S1205, the rendering device acquires the updated UV parameters of the 3D object.

It should be noted that, S1205 may refer to a specific implementation of S1203, which is not described herein.

After S1205, the rendering device may execute S1204 to fuse the single view AO map layers of each view according to the UV parameters after the 3D object update, to obtain an AO map after the 3D object update.

In the process of S1205, the acquired single view AO layers are fused according to the updated UV parameters, so that AO maps do not need to be acquired again, the reusability is high, the calculation time and the calculation cost are greatly saved, and the efficiency of acquiring high-quality AO maps is further improved.

Further, as shown in fig. 16, the method for obtaining an AO map provided in the embodiment of the present application may further include S1206.

S1206, the rendering device superimposes the AO-map onto the three-dimensional model of the 3D object for display.

In one possible implementation, the rendering device in S1206 may superimpose the AO-map onto the three-dimensional model of the 3D object and then display it by the display device.

In another possible implementation, the rendering device in S1206 may provide the AO map and the three-dimensional model of the 3D object to the display device, and the display device performs a lightweight operation to superimpose the AO map onto the three-dimensional model of the 3D object for display.

Further, the three-dimensional model of the 3D object used in S1206 may be the 3D mesh model, or may be a model obtained by thinning the 3D mesh model, which is not limited in the embodiment of the present application.

According to experimental verification, compared with the calculation of a ray tracing rendering equation, the performance of the scheme provided by the application is compared with that of the scheme shown in the following tables 1 and 2. As can be seen from the data in tables 1 and 2, the AO maps with the same quality are generated, and the application scheme greatly shortens the time consumption and improves the efficiency of acquiring the high-quality AO maps.

TABLE 1

TABLE 2

The above description has been presented with respect to the solution provided by the embodiment of the present invention mainly from the point of view of the working principle of the device. It is understood that devices and the like comprise corresponding hardware structures and/or software modules for performing the functions described above. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

According to the embodiment of the application, the function modules of the device for acquiring the AO map provided by the application may be divided according to the method example, for example, each function module may be divided corresponding to each function, or two or more functions may be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. The division of the modules in the embodiment of the present application is schematic, which is merely a logic function division, and other division manners may be implemented in practice.

Fig. 17 shows a possible structural diagram of an apparatus for acquiring an AO map deployed in the rendering device related to the above-described embodiment in the case where respective functional blocks are divided with corresponding respective functions. The means 170 for obtaining an AO map may be a functional module or a chip. As shown in fig. 17, the apparatus 170 for acquiring an AO map may include: an acquisition unit 1701, a prediction unit 1702, a UV parameterization unit 1703, a fusion unit 1704. Wherein the acquisition unit 1701 is configured to perform a process S1201 in fig. 12 or 16; the prediction unit 1702 is used to perform the process S1202 in fig. 12 or 16; the UV parameterization unit 1703 is used to perform the process S1203 in fig. 12 or 16, or to perform the process S1205 in fig. 16; the fusion unit 1704 is used to execute the process S1204 in fig. 12 or fig. 16. All relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

Further, as shown in fig. 18, the apparatus 170 for acquiring an AO map may further include a display unit 1206 for executing the process S1206 in fig. 16.

In case of using an integrated unit, fig. 19 shows a possible structural schematic of the rendering device involved in the above-described embodiment. Rendering device 190 may include: processing module 1901, communication module 1902. The processing module 1901 is used for controlling and managing the action of the rendering device, and the communication module 1902 is used for communicating with other devices. For example, the processing module 1901 is used to perform any one of the processes S1201 to S1204 in fig. 12 or 16. Rendering device 190 may also include a storage module 1903 to store program code and data for rendering device 190.

The processing module 1901 may be the processor 1101 in the physical structure of the apparatus for acquiring AO maps 110 shown in fig. 11, and may be a processor or a controller. For example, it may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processing module 1901 can also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of DSP and microprocessor, etc. The communication module 1902 may be the transceiver 1103 in the physical structure of the AO map capturing device 110 shown in fig. 11, and the communication module 1902 may be a communication port, or may be a transceiver, a transceiver circuit, a communication interface, or the like. Alternatively, the communication interface may implement communication with other devices through the element having the transceiver function. The above-mentioned elements with transceiving functions may be realized by antennas and/or radio frequency devices. The storage module 1903 may be the memory 1102 in the physical structure of the AO map-acquiring device 110 shown in fig. 11.

As mentioned above, the apparatus 170 or the rendering device 190 for obtaining an AO map provided in the embodiments of the present application may be used to implement the corresponding functions in the methods implemented by the embodiments of the present application, and for convenience of explanation, only the portions relevant to the embodiments of the present application are shown, and specific technical details are not disclosed, which refer to the embodiments of the present application.

As another form of the present embodiment, there is provided a computer-readable storage medium having stored thereon instructions that, when executed, perform the method of obtaining an AO map in the above-described method embodiment.

As another form of this embodiment, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of obtaining AO maps in the method embodiment described above.

The embodiment of the application further provides a chip system, which comprises a processor and is used for realizing the technical method of the embodiment of the invention. In one possible design, the system on a chip also includes memory to hold the program instructions and/or data necessary for embodiments of the present invention. In one possible design, the system-on-chip further includes a memory for the processor to invoke application code stored in the memory. The chip system may be formed by one or more chips, or may include chips and other discrete devices, which are not specifically limited in this embodiment.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in RAM, flash memory, ROM, erasable programmable read-only memory (erasable programmable ROM, EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a compact disc read-only memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may be located in a core network interface device. The processor and the storage medium may reside as discrete components in a core network interface device. Alternatively, the memory may be coupled to the processor, e.g., the memory may be separate and coupled to the processor via a bus. The memory may also be integrated with the processor. The memory may be used for storing application program codes for executing the technical solutions provided in the embodiments of the present application, and the processor may control the execution. The processor is configured to execute the application program code stored in the memory, thereby implementing the technical solution provided in the embodiments of the present application.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of obtaining an ambient light occluding AO map, the method comprising:

acquiring a geometric information layer of the 3D object at each view angle of N view angles according to two-dimensional images of the three-dimensional 3D object at the N view angles and camera parameters of the N view angles, wherein the geometric information layer comprises a normal line diagram; the N is more than or equal to 2;

inputting the geometric information layer of each view into a trained neural network to obtain a single view AO layer of each view;

performing UV parameterization according to the two-dimensional images of the 3D object at the N view angles to obtain UV parameters of a grid model of the 3D object;

and fusing the single-view AO layers of each view according to the UV parameters to obtain the AO mapping of the 3D object.

2. The method of claim 1, wherein the geometric information layer further comprises a depth map.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

acquiring updated UV parameters of the 3D object;

and fusing the single-view AO layers of each view according to the updated UV parameters of the 3D object to obtain the updated AO mapping of the 3D object.

4. A method according to any one of claim 1 to 3, wherein,

the difference of the AO values of the single-view AO image layers of different views, which correspond to the same position, is smaller than or equal to a first preset value;

or alternatively, the process may be performed,

and the difference value of the AO values of the same position calculated by the single-view AO layer and the ray tracing rendering equation is smaller than or equal to a second preset value.

5. The method of any of claims 1-4, wherein the neural network comprises a deep convolutional neural network trained from training data obtained by a ray-tracing rendering equation.

6. The method according to any one of claims 1-5, wherein the performing UV parameterization based on the two-dimensional image of the 3D object at N perspectives to obtain UV parameters of the mesh model of the 3D object comprises:

3D modeling is carried out according to the two-dimensional images of the 3D object at the N view angles to obtain a 3D grid model of the 3D object; performing UV unfolding on the 3D grid model to a UV parameter plane to obtain a UV unfolding diagram; the UV unfolded graph comprises UV coordinates of each vertex in the 3D grid model;

Taking a single-view UV layer as the UV parameter; a single view UV layer includes UV coordinates for each pixel in the view.

7. The method of any of claims 1-6, wherein the camera parameters include one or more of the following: camera pose, camera focal length.

8. The method of any of claims 1-7, wherein the N perspective two-dimensional images cover a surface of the 3D object.

9. The method according to any one of claims 1-8, further comprising:

and overlaying the AO mapping on the three-dimensional model of the 3D object for display.

10. An apparatus for obtaining an ambient light occluding AO map, the apparatus comprising:

the acquisition unit is used for acquiring a geometric information layer of the 3D object in each view angle of the N view angles according to the two-dimensional images of the three-dimensional 3D object in the N view angles and the camera parameters of the N view angles, wherein the geometric information layer comprises a normal map; the N is more than or equal to 2;

the prediction unit is used for inputting the geometric information image layer of each view into a trained neural network to obtain a single view AO image layer of each view;

The UV parameterization unit is used for carrying out UV parameterization according to the two-dimensional images of the 3D object at the N view angles to obtain UV parameters of the grid model of the 3D object;

11. The apparatus of claim 10, wherein the geometric information layer further comprises a depth map.

12. The device according to claim 10 or 11, wherein,

the UV parameterization unit is also used for acquiring updated UV parameters of the 3D object;

the fusion unit is further configured to fuse the single-view AO map layer of each view according to the updated UV parameters of the 3D object obtained by the UV parameterization unit, so as to obtain an AO map after the update of the 3D object.

13. The device according to any one of claims 10 to 12, wherein,

or alternatively, the process may be performed,

14. The apparatus of any of claims 10-13, wherein the neural network comprises a deep convolutional neural network trained from training data obtained by a ray-tracing rendering equation.

15. The apparatus according to any one of claims 10 to 14, wherein the UV parameterization unit is specifically configured to:

16. The apparatus of any of claims 10-15, wherein the camera parameters include one or more of the following: camera pose, camera focal length.

17. The apparatus of any of claims 10-16, wherein the N perspective two-dimensional images cover a surface of the 3D object.

18. The apparatus according to any one of claims 10-17, wherein the apparatus further comprises:

And the display unit is used for overlaying the AO mapping obtained by the fusion unit on the three-dimensional model of the 3D object for display.

19. A rendering apparatus, characterized in that the rendering apparatus comprises: a processor and a memory;

the memory is connected with the processor; the memory is for storing computer instructions that, when executed by the processor, cause the rendering device to perform the method of acquiring an ambient light occluding AO map as recited in any one of claims 1-9.

20. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of acquiring an ambient light occluding AO map of any one of claims 1-9.

21. A computer program product, characterized in that it when run on a computer causes the computer to perform the method of acquiring an ambient light occluding AO map as defined in any one of claims 1-9.