CN114862997A

CN114862997A - Image rendering method and apparatus, medium, and computer device

Info

Publication number: CN114862997A
Application number: CN202210369598.6A
Authority: CN
Inventors: 孙飞; 杨瑞健; 赵代平
Original assignee: Beijing Datianmian White Sugar Technology Co ltd
Current assignee: Beijing Datianmian White Sugar Technology Co ltd
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2022-08-05

Abstract

The embodiment of the disclosure provides an image rendering method and device, a medium and computer equipment, wherein the method comprises the following steps: determining a target image area from an image acquired by an image acquisition device; determining attitude information when the image acquisition device acquires the image; sampling a target texture area from a pre-acquired texture map based on the attitude information; different attitude information corresponds to different target texture areas; rendering the target texture region into the target image region.

Description

Image rendering method and apparatus, medium, and computer device

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to an image rendering method and apparatus, a medium, and a computer device.

Background

In the related art, some rendering materials are often rendered into a real captured image, so that the rendered image obtains a certain visual effect. However, images rendered by the image rendering method in the related art tend to be less realistic.

Disclosure of Invention

In a first aspect, an embodiment of the present disclosure provides an image rendering method, where the method includes: determining a target image area from an image acquired by an image acquisition device; determining attitude information when the image acquisition device acquires the image; sampling a target texture area from a pre-acquired texture map based on the attitude information; different attitude information corresponds to different target texture areas; rendering the target texture region into the target image region.

When the image is rendered, the target texture area is sampled from the texture map based on the posture information of the image acquired by the image acquisition device, and then the target texture area is rendered into the target image area. Different attitude information corresponds to different target texture areas, namely the sampled target texture areas can change along with the change of the attitude information when the image acquisition device acquires the image, so that the visual effect of the target image area can change along with the change of the attitude of the image acquisition device. Therefore, the image rendering method disclosed by the embodiment of the disclosure can effectively simulate the real scene of the image collected by the user, and improve the reality degree of the image rendering effect.

In some embodiments, the target image region is determined based on at least any one of the following conditions: a semantic category of at least one image region in the image; pixel location of at least one image region in the image; attribute information of at least one object in the image.

The method and the device for determining the target area can flexibly determine the target area according to conditions such as semantic type, pixel position and attribute information of the object in the image area, and have wide application range and strong expansibility.

In some embodiments, the number of the target image areas is greater than 1, and each target image area corresponds to a texture map; the sampling of the target texture region from the pre-acquired texture map based on the pose information includes: sampling target texture areas from texture maps corresponding to the target image areas based on the attitude information; the rendering the target texture region into the target image region comprises: and respectively rendering each target texture area into a corresponding target image area.

When the image comprises a plurality of target image areas, the rendering special effect can be generated for each target image area, and each target image area is provided with the texture map corresponding to the area, so that different target image areas can render different special effects, and the rendering interestingness and the reality degree are improved.

In some embodiments, said rendering said target texture region into said target image region comprises: performing masking processing on other regions except the target image region in the image to obtain a mask image corresponding to the image; the mask image comprises a first mask region corresponding to the target image region and a second mask region corresponding to the other regions; and rendering the pixel points of the target texture region into the first masking region, and rendering the pixel points of the other regions into the second masking region.

According to the embodiment of the disclosure, the boundary between the target image area and the non-target image area can be accurately determined through mask processing, so that the rendering accuracy is improved.

In some embodiments, the method further comprises: generating a three-dimensional stereogram based on the texture map; the sampling of the target texture region from the pre-acquired texture map based on the pose information includes: and sampling a target texture region from the three-dimensional stereo map based on the attitude information.

According to the embodiment of the disclosure, the target texture region can be sampled from the three-dimensional map based on the attitude information of the image acquisition device, so that the sense of reality in a three-dimensional space corresponding to the three-dimensional map is presented in the rendering effect, and the rendering reality degree is improved.

In some embodiments, the number of texture maps is greater than 1; the generating of the three-dimensional stereogram based on the texture map comprises: acquiring a predetermined three-dimensional graph, wherein the three-dimensional graph comprises a plurality of surfaces; marking identification information on each texture map in advance, wherein the identification information is used for determining the corresponding surface of the texture map and the three-dimensional graph; and rendering each texture map to each surface of the three-dimensional graph respectively based on the identification information to obtain the three-dimensional map.

The three-dimensional map is generated based on the multiple texture maps and the identification information thereof, and the complexity of the processing mode is low.

In some embodiments, the three-dimensional graphic has fixed pose information; the sampling of the target texture region from the pre-acquired texture map based on the pose information comprises: and sampling a target texture region from a pre-acquired texture map based on the difference between the pose information and the fixed pose information. In other embodiments, the pose information of the three-dimensional graph dynamically changes; the sampling of the target texture region from the pre-acquired texture map based on the pose information includes: and sampling a target texture region from a pre-acquired texture map based on the difference between the attitude information and the attitude information of the three-dimensional graph when the image acquisition device acquires the image.

According to the method and the device, the target texture area is determined from the three-dimensional graph with the fixed attitude information or the three-dimensional graph with the dynamically changed attitude information in different modes, so that the scheme of the embodiment of the disclosure is suitable for various three-dimensional graphs with the dynamically changed fixed attitude information or attitude information, and is wide in application range. For the three-dimensional graph with fixed attitude information, different target texture areas can be sampled only by adjusting the attitude information of the image acquisition device, and for the three-dimensional graph with dynamically changed attitude information, the target texture areas acquired by the same attitude information at different moments can be different, so that different rendering effects are presented in different scenes, and the diversity and the interestingness of the rendering effects are improved.

In some embodiments, the texture map corresponding to at least one face of the three-dimensional graphic comprises a plurality of different texture maps in a sequence of texture maps.

Because the texture map of at least one surface of the three-dimensional graph comprises a plurality of different texture maps in the texture map sequence, and each texture map in the same sequence can be dynamically switched, the rendering effect of the target image area in the rendered image can be dynamically changed, and the rendering interest is improved. For example, at one time, a target texture region determined from a first texture map in the sequence of texture maps may be rendered into the target image region, and at another time, a target texture region determined from a second texture map in the sequence of texture maps may be rendered into the target image region, and so on.

In some embodiments, the image comprises a plurality of video frames in a video; the sampling of the target texture region from the pre-acquired texture map based on the pose information includes: sampling a target texture area corresponding to each video frame from a pre-acquired texture map based on the attitude information of each video frame acquired by the image acquisition device; the rendering the target texture region into the target image region comprises: and rendering the target texture area corresponding to each video frame into the corresponding video frame.

According to the embodiment of the rendering method and the rendering device, the rendering special effect can be generated for each video frame in the video, and the rendering reality degree of each video frame in the video is improved.

In some embodiments, the sampling a target texture region corresponding to each video frame from a pre-obtained texture map based on the pose information of each video frame acquired by the image acquisition device includes: under the condition that the attitude information of the image acquisition device when acquiring any target video frame in the plurality of video frames is different from the attitude information when acquiring the previous video frame, re-sampling a target texture area from the texture map based on the attitude information when acquiring the target video frame; and/or determining a target texture area corresponding to a previous frame of video frame as a target texture area corresponding to the target video frame under the condition that the attitude information when the image acquisition device acquires any target video frame in the plurality of video frames is the same as the attitude information when the previous video frame is acquired.

The method and the device can respectively determine the target texture area for each video frame based on the attitude information of the image acquisition device when acquiring each video frame, and once the attitude information of the image acquisition device when acquiring a certain video frame is changed compared with the attitude information when acquiring the previous video frame, the determined target texture area can be correspondingly changed; if the pose information of the image acquisition device when acquiring two adjacent video frames is the same, the target texture areas determined for the two video frames are also the same. Therefore, the rendering effect of each video frame in the video dynamically changes along with the attitude information of the image acquisition device, and the rendering reality degree is improved.

In some embodiments, a specular reflection area is included in the image; the specular reflection area comprises a mirror image of at least part of the target image area; the method further comprises the following steps: determining attitude information of the specular reflection area based on attitude information when the image acquisition device acquires the image; sampling texture regions corresponding to the specular reflection regions from the texture map based on pose information of the specular reflection regions; rendering a texture region corresponding to the specular reflection region into the specular reflection region.

The embodiment of the disclosure can generate the rendering special effect for the specular reflection area in the image, and further improves the rendering reality degree.

In a second aspect, an embodiment of the present disclosure provides an image rendering apparatus, including: the first determining module is used for determining a target image area from an image acquired by the image acquisition device; the second determining module is used for determining the attitude information when the image acquisition device acquires the image; the sampling module is used for sampling a target texture area from a pre-acquired texture map based on the attitude information; different attitude information corresponds to different target texture areas; a rendering module to render the target texture region into the target image region.

In a third aspect, the embodiments of the present disclosure provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to any of the embodiments.

In a fourth aspect, embodiments of the present disclosure provide a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any embodiment when executing the program.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart of an image rendering method according to an embodiment of the present disclosure.

Fig. 2A is a schematic diagram of an original image of an embodiment of the disclosure.

Fig. 2B is a schematic diagram of each image area in the original image shown in fig. 2A.

FIG. 3 is a schematic diagram of a relationship between a target texture region and pose information according to an embodiment of the present disclosure.

Fig. 4A, 4B, 4C, and 4D are schematic views of a three-dimensional map according to an embodiment of the present disclosure.

Fig. 5A is a schematic diagram of a texture map of an embodiment of the present disclosure.

FIG. 5B is a schematic view of a cube of an embodiment of the disclosure.

Fig. 6A and 6B are schematic diagrams of a relationship between a cube posture and a posture of an image capturing device according to an embodiment of the present disclosure.

Fig. 7A is a schematic diagram of a mask image of an embodiment of the present disclosure.

Fig. 7B and 7C are schematic diagrams of rendering effects corresponding to different pose information according to an embodiment of the present disclosure.

Fig. 8 is a schematic diagram of rendering effects when there are a plurality of target image areas according to an embodiment of the present disclosure.

Fig. 9 is a schematic diagram of rendering effects of a specular reflection area according to an embodiment of the present disclosure.

Fig. 10 is a block diagram of an image rendering apparatus according to an embodiment of the present disclosure.

Fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.

Fig. 12 is a schematic diagram of an image rendering system of an embodiment of the disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In order to make the technical solutions in the embodiments of the present disclosure better understood and make the above objects, features and advantages of the embodiments of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure are described in further detail below with reference to the accompanying drawings.

The inventor finds that, in the related art, when rendering is performed, some areas in an image are generally simply replaced by special effect materials, and the image rendered by the image rendering method is often low in reality degree. For example, when a video frame shot by a user through a mobile phone is subjected to special effect rendering, the posture information of the mobile phone when the video frame is shot is often changed, but a special effect material is always fixed, so that the user is difficult to have an in-person feeling, and the reality degree of an image rendering effect is reduced.

Based on this, the present disclosure provides an image rendering method, referring to fig. 1, the method comprising:

step 101: determining a target image area from an image acquired by an image acquisition device;

step 102: determining attitude information when the image acquisition device acquires the image;

step 103: sampling a target texture area from a pre-acquired texture map based on the attitude information; different attitude information corresponds to different target texture areas;

step 104: rendering the target texture region into the target image region.

The embodiment of the disclosure can be applied to terminal devices such as mobile phones and tablet computers with image acquisition and processing functions, and the terminal devices can perform rendering processing on the acquired images through the image processing devices on the terminal devices after the images are acquired; the method can also be applied to equipment which does not have an image acquisition function but has an image processing function, for example, a server, and the server can acquire an image acquired by an image acquisition device and perform rendering processing on the image.

In step 101, the image capturing device may include, but is not limited to, a camera, a video camera, and the like, and the image may be an image that the image capturing device can capture in real time or an image that the image capturing device captures in advance and stores. The image may be a single image or may comprise a plurality of consecutive or non-consecutive video frames in the video. The target image area may be determined from the image, where the target image area is an image area in the image that needs special effect rendering, and the image area may be a local area in the image or an entire image.

There are many ways to determine the target image area, and some of the ways to determine the target image area are illustrated below in conjunction with fig. 2A and 2B. Fig. 2A shows an original image, that is, an image captured by an image capturing device, and fig. 2B shows image regions in the original image, for example, including a sky region 201, a house region 202, a person region 203, and a ground region 204. Of course, this is only one way to divide the image region, and further, it is also possible to divide the image region into more detailed image regions, for example, further divide the sky region 201 into a cloud region, a sun region and a sky background region, where the cloud region is a region including clouds, the sun region is a region including the sun, and the sky background region is a region of the sky region 201 except the sun region and the cloud region. For another example, the room area 202 may be divided into a roof area, a window area, a wall area, a gate area, and the like. Or dividing the house area into a left house area, a left house area and the like according to the position of the house in the image, wherein the left house area and the left house area are respectively an area where a first house from left to right and an area where a second house from left to right are located in the image.

In some embodiments, the target image region may be determined based on semantic categories of image regions in the image. For example, an image region of a preset semantic category may be determined as the target image region, or an image region other than the preset semantic category in the image may be determined as the target image region. In the image shown in fig. 2A, assuming that the preset semantic category is a sky category, the sky region 201 may be determined as the target image region, or image regions (i.e., the house region 202 and the person region 203) outside the sky region 201 may be determined as the target image region. The preset semantic categories and the number thereof may be specified by a user, or default semantic categories may be adopted.

In some embodiments, the target image region may be determined based on the pixel location of each image region in the image. The pixel location of an image region may be determined based on the pixel coordinates of the image region, e.g., for a rectangular image region, the pixel location of the image region may be determined based on the pixel coordinates of the vertices of the image region; for a circular region, the pixel location of the image region may be determined based on the pixel coordinates of the center of the image region and the radius. In this case, an image area within the preset coordinate range may be determined as the target image area, or an image area outside the preset coordinate range may be determined as the target image area. Assuming that the sky region 201 is within the preset coordinate range, the sky region 201 may be determined as the target image region, or image regions outside the sky region 201 (i.e., the house region 202 and the person region 203) may be determined as the target image region.

Alternatively, the pixel position of one image region may be determined based on the relative position relationship between the image region and other image regions in the image, for example, in the embodiment shown in fig. 2A, the sky region 201 is above the image, the house region 202 is in the middle of the image, and the person region 203 is below the image. In this case, an image region at the preset relative position may be determined as the target image region, or an image region other than the preset relative position may be determined as the target image region. For example, an image region in the middle of the image may be determined as the target image region. In the embodiment shown in fig. 2A, the image area in the middle is the house area 202, and therefore, the house area 202 may be determined as the target image area, or the image areas outside the house area 202 (i.e., the sky area 201 and the person area 203) may be determined as the target image areas.

In some embodiments, the target image region may be determined based on attribute information of objects in the image. The attribute information of an object may include, but is not limited to, at least one of a pixel size, depth information, a pixel value, etc. of the object, and when the object is a person, the attribute information of the object may further include an age, a sex, etc. of the person; when the object is a building, the attribute information of the object may further include a shape, a floor height, and the like of the building. Different classes of objects may employ different attribute information. An image area of an object including the preset attribute information in the image may be determined as the target image area, or an area other than the image area of the object including the preset attribute information in the image may be determined as the target image area. For example, in a case where the preset attribute information is female, assuming that the gender of the black person in the person region 203 is male and the gender of the white person is female, the image region in which the white person is located may be determined as the target image region, or the image regions other than the white person may be determined as the target image region.

Besides the above-listed manners of determining the target image area, other manners of determining the target image area may also be adopted, and the specific manner may be selected based on actual needs, which are not listed here. The target image area in the embodiment of the disclosure can be freely switched according to actual needs, so that different special effect rendering effects are realized.

In some embodiments, the images may be analyzed by a pre-trained neural network; determining the target image area based on the analysis result. For example, in the case that the target image region is an image region of a specific semantic category in the image, the neural network may be an image segmentation network, and the analysis result includes semantic information of each image region in the image. The image may be semantically segmented by the image segmentation network and the target image region may be determined based on the semantic segmentation result. The image segmentation network can be a detection framework based on deep learning, the semantic segmentation result can be rapidly and accurately obtained by adopting the image segmentation network, for example, in a scene segmented from the sky, different scenes such as the day and the night can be accurately identified, the boundary transition effect is good, and the method is suitable for the field of real-time rendering. For another example, in a case that the target image region is an image region where a pixel point within a preset depth range in the image is located, the neural network may be a detection network, and the analysis result includes depth information of each pixel point in the image. The depth information of each pixel point in the image can be detected through the detection network, and the target image area is determined based on the detected depth information. The depth information of each pixel point in the image can be rapidly and accurately acquired by adopting the detection network, and the method is suitable for the field of real-time rendering. In the case of determining the target image area in other manners, other types of neural networks may also be used, which are not listed here.

In some embodiments, reference information may be obtained; in the case where the analysis result of one image area matches the reference information, the image area is determined as the target image area. The reference information may be information designated by a user or default information set in advance. In the case where the target image region is determined based on the semantic category of each image region in the image, the reference information may be a reference semantic category. For example, in the embodiment shown in fig. 2A, assuming that the reference semantic category is a sky category, the semantic category of each image region may be acquired through a neural network, and if the semantic category of one image region matches the sky category, the image region is determined as the target image region. In the case where the target image region is determined based on the attribute information of each object in the image, the reference information may be reference attribute information, for example, a reference depth range. If the depth information of a certain object in the image is within the reference depth range, the image area where the object is located can be determined as the target image area. In addition to the above-listed cases, the reference information may be set as other information based on actual situations, and is not listed here.

In step 102, pose information of the image capture device at the time of capturing the image may be determined. The attitude information may be acquired by an attitude sensor on the image acquisition device. In a scene where an image is acquired by the image acquisition device in real time, attitude information output by the attitude sensor in real time can be used as attitude information when the image acquisition device acquires the image. In a scenario where an image is not acquired in real time (for example, acquired and stored in advance), the attitude information output when the attitude sensor acquires the image may be stored in association with the image so as to acquire the attitude information when the image acquisition device acquires the image.

In step 103, the number of texture maps may be greater than or equal to 1. Each texture map can be packaged into a resource package in advance, and when image rendering is needed, the texture map is obtained from the resource package. The texture map may include a target texture region, the target texture region in the texture map is related to pose information of the image capturing device, and different pose information corresponds to different target texture regions. As shown in fig. 3, in the case where the pose information of the camera 301 is p1, the target texture region in the texture map 302 is as shown in the dark image block S1; in the case where the pose information of the camera 301 is p2, the target texture region in the texture map 302 is as shown in the dark image block S2.

In some embodiments, a three-dimensional stereomap may be generated based on the texture map, from which a target texture region is sampled based on the pose information. The three-dimensional map may include, but is not limited to, a rectangular parallelepiped map (as shown in fig. 4A), a spherical map (as shown in fig. 4B), a hemispherical map (as shown in fig. 4C), and the like, and the three-dimensional map may be a closed three-dimensional graph or an unclosed three-dimensional graph. Fig. 4D shows a case where three texture maps are spliced at a certain angle to form a non-closed three-dimensional stereogram, and of course, other numbers of texture maps can be used to splice three-dimensional stereograms at other angles and manners besides those shown in the figure. In addition to the three-dimensional map described above, in the embodiments of the present disclosure, other types of three-dimensional maps may be generated by using texture maps, which is not described herein again.

A specific mode of generating a three-dimensional stereogram will be described below by way of example. In some embodiments, a predetermined three-dimensional graphic may be obtained, the three-dimensional graphic comprising a plurality of faces; marking identification information on each texture map in advance, wherein the identification information is used for determining the corresponding surface of the texture map and the three-dimensional graph; and rendering each texture map to each surface of the three-dimensional graph respectively based on the identification information to obtain the three-dimensional map.

For convenience of understanding, the following description will be given taking an example in which the three-dimensional graph is a cube and a cube map (cube map) is generated by a texture map. The cube map is a special cuboid map, six faces of the cube map are squares, and the six faces form a cube. Six texture maps can be packaged into a material package, and the correspondence between the six texture maps and six faces of the cube is marked respectively. Fig. 5A shows a texture map and identification information thereof. The six texture maps are labeled "front", "back", "left", "right", "top", and "bottom", in that order. In the cube shown in fig. 5B, the surface surrounded by ABCD and the surface surrounded by EFGH correspond to the texture map with "upper" identification information and the texture map with "lower" identification information, the surface surrounded by CDHG and the surface surrounded by ABFE correspond to the texture map with "front" identification information and the texture map with "rear" identification information, and the surface surrounded by CAEG and the surface surrounded by DBFH correspond to the texture map with "left" identification information and the texture map with "right" identification information, respectively. And pasting the texture map to the corresponding surface of the cube according to the corresponding relation to generate the cube map. The foregoing embodiment shows a case where texture maps corresponding to respective faces in a cube map are different, and in practical application, texture maps corresponding to at least two faces in the cube map may also be the same. Besides generating the cube map by adopting the above method, the texture map can also be attached to other figures to form other types of three-dimensional stereo maps, which is not described herein again.

After the three-dimensional stereo map is generated, the target texture region may be determined based on pose information when the image acquisition device acquires the image. Still taking the cube map as an example, a local coordinate system may be established. The local coordinate system is not unique, and in some embodiments, as shown in fig. 5B, the local coordinate system may be determined such that the center of the cube is used as an origin, a direction orthogonal to the right side of the cube is determined as an x-axis forward direction, a direction orthogonal to the upper side of the cube is determined as a y-axis forward direction, a direction orthogonal to the rear side of the cube is determined as a z-axis forward direction, and the forward direction of each coordinate axis is indicated by a solid arrow in the figure. Assuming that the image acquisition device is located at the origin of coordinates, a direction vector v can be calculated based on the posture information of the image acquisition device, and then the three-dimensional map is sampled based on the direction vector v, so that a target texture region can be obtained.

In the embodiment of the present disclosure, the pose information of the three-dimensional figure may be fixed or may be dynamically changed, for example, the initial pose information of the three-dimensional figure may be changed, or the pose information of the three-dimensional figure may be periodically or non-periodically changed with time.

In the case where the three-dimensional figure has fixed posture information, a target texture region may be sampled from a texture map acquired in advance based on a difference between the posture information when the image acquisition device acquires the image and the fixed posture information. As shown in fig. 6A, it is assumed that time t1 is the initial time, the posture information of the camera at time t1 is p1, time t2 is some time after the initial time, and the posture information of the camera at time t2 is p 2. The initial time may be a time when a functional module executing the method of the embodiment of the present disclosure is started, or a time when a reset instruction sent by a user is received. In the present embodiment, regardless of how the pose information p1 of the camera changes, the pose information of the cube is always the pose information shown in the figure, assuming that the pose information is p 0. Thus, at time t1, target texture information may be determined based on the difference between p1 and p 0; at time t2, target texture information may be determined based on the difference between p2 and p 0.

In the case where the initial pose information of the three-dimensional figure is variable and is determined based on the initial pose information of the image capturing apparatus, the target texture region may be sampled from a texture map acquired in advance based on a difference between the pose information and the initial pose of the image capturing apparatus. As shown in fig. 6B, still assuming that time t1 is the initial time, the posture information of the camera at time t1 may be p11 or p12, and p11 is different from p 12. Unlike the previous embodiment, the initial pose information of the cube is related to the initial pose information of the camera, and in the case where the pose information of the camera at the time t1 is p11, the initial pose information of the cube is p01, and therefore, the target texture region S11 at the time t1 can be determined based on the difference between p11 and p 01. In the case where the pose information of the camera at time t1 is p12, the initial pose information of the cube is p02, and thus the target texture region S12 at time t1 may be determined based on the difference between p12 and p 02.

When the time t2 is a time after the initial time and the pose information of the cube is the same as the initial pose information of the cube, the target texture region S21 at the time t2 may be determined based on the difference between p21 and p01 in the case where the pose information of the camera at the time t2 is changed from p11 to p 21; in the case where the pose information of the camera at time t2 is changed from p21 to p22, the target texture region S22 at time t2 may be determined based on the difference between p22 and p 02.

In addition to the above, the pose information of the three-dimensional figure at any one time may also be dynamically changed. In this case, assuming that the time when the image capturing device captures the image is t, the target texture region may be sampled from the texture map acquired in advance based on a difference between the pose information of the image capturing device at the time t and the pose information of the three-dimensional figure at the time t.

It should be noted that the texture map corresponding to at least one surface of the three-dimensional figure may be a single texture map or may comprise a plurality of different texture maps in a texture map sequence. Still taking the cube map as an example, the texture map corresponding to the left side face of the cube may include a plurality of texture maps { F1, F2, … …, Fn } (n is an integer greater than 1) in the texture map sequence seq, where each texture map in the texture map sequence seq may be different from each other, or may be only partially different, for example, two adjacent texture maps in the texture map sequence seq are different. The texture maps of the same surface of the three-dimensional graph may be dynamically changed in the plurality of texture maps included in the sequence seq of texture maps corresponding to the surface at different times. The dynamic change may be to randomly select a texture map in the sequence seq of texture maps as the texture map corresponding to the face of the three-dimensional graphic at each time, for example, at time t1, to randomly select the texture map F3 as the texture map of the face on the left side of the cube, and at time t2, to randomly select the texture map F1 as the texture map of the face on the left side of the cube. Alternatively, the texture maps in the texture map sequence seq may be selected sequentially in the order of the texture maps in the texture map sequence seq. For example, texture map F1 is selected at time t1, texture map F2 is selected at time t2, texture map F3 is selected at time t3, and so on. Therefore, the texture map on each surface can be dynamically changed, so that the final rendering effect is richer and more diversified, and the rendering interestingness is improved.

In step 104, the target texture region may be rendered into the target image region, that is, the target texture region and the captured image are fused to obtain a rendering effect. In some embodiments, masking may be performed on other regions of the image except for the target image region to obtain a mask image corresponding to the image; the mask image comprises a first mask region corresponding to the target image region and a second mask region corresponding to the other regions; and rendering the pixel points of the target texture region into the first masking region, and rendering the pixel points of the other regions into the second masking region. Fig. 7A shows that the mask image corresponding to the image shown in fig. 2A is a sky area 201, and the image area corresponding to the white pixel in fig. 7A is a first mask area, and the image area corresponding to the black pixel is a second mask area. Pixel points in the target texture region may be rendered into the first mask region and pixel points of other image regions, such as the house region 202 and the character region 203, may be rendered into the second mask region. Rendering effects corresponding to different pose information are shown in fig. 7B and 7C, respectively. As shown in fig. 7B, a target texture region corresponding to a certain pose of the image acquisition apparatus is shown as a texture region 701 in the figure, and pixel points in the texture region 701 are rendered into a first mask region, and pixel points in other regions in the acquired image are rendered into a second mask region, so as to obtain a rendered image I1. Similarly, as shown in fig. 7C, the corresponding texture region of the target in the other pose of the image capture apparatus is shown as the texture region 702 in the figure, and the corresponding rendered image is I2.

In some embodiments, the number of target image regions is greater than 1, and each target image region corresponds to a texture map. The target texture regions may be sampled from the texture map corresponding to each target image region based on the pose information. And respectively rendering each target texture area into a corresponding target image area. For example, in the embodiment shown in fig. 2A, including the sky area 201 and the ground area 204, a texture map corresponding to the sky area 201 is shown as a star-sky map 801 in fig. 8, and a texture map corresponding to the ground area 204 is shown as a grassland map 802 in fig. 8, then pixel points in the star-sky map 801 may be rendered into the sky area 201, and pixel points in the grassland map 802 may be rendered into the ground area 204, with rendering effects shown as

image areas

803 and 804, respectively. In addition to the situation shown in this embodiment, the image may have other numbers and types of target image areas, and each target image area may have a texture map corresponding to the area.

In some embodiments, the image comprises a plurality of video frames in a video. The method can sample a target texture area corresponding to each video frame from a pre-acquired texture map based on the posture information of each video frame acquired by the image acquisition device, and render the target texture area corresponding to each video frame into the corresponding video frame. In this way, the visual effect of the target image area in each captured video frame can change with the change of the posture of the image capturing device. Therefore, the image rendering method disclosed by the embodiment of the disclosure can effectively simulate the real scene of the image collected by the user, so that the user can obtain the feeling of being personally on the scene, and the reality degree of the image rendering effect is improved.

Specifically, when the pose information of the image capturing device at the time of capturing any one target video frame of the plurality of video frames is different from the pose information at the time of capturing the previous video frame, the target texture region is re-sampled from the texture map based on the pose information at the time of capturing the target video frame. For example, if the pose information of the image capturing device when capturing the k-th frame of video frame is different from the pose information of the image capturing device when capturing the k-1 th frame of video frame, the target image area corresponding to the k-th frame of video frame may be determined from the texture map based on the pose information of the image capturing device when capturing the k-th frame of video frame.

And under the condition that the attitude information when the image acquisition device acquires any target video frame in the plurality of video frames is the same as the attitude information when the image acquisition device acquires the previous video frame, determining the target texture area corresponding to the previous video frame as the target texture area corresponding to the target video frame. For example, if the pose information of the image capturing device when capturing the k-th frame of video frame is the same as the pose information of the image capturing device when capturing the k-1 th frame of video frame, the target image area corresponding to the k-1 th frame of video frame may be determined as the target image area corresponding to the k-th frame of video frame.

In some embodiments, the image includes a specular reflection area, which refers to an image area capable of reflecting images in other areas, and in the image captured by the image capturing device, the specular reflection area may include a mirror image of at least a part of the image. The specular reflection area may include, but is not limited to, mirrors, glass curtain walls of office buildings, and the like. In order to further improve the reality of the rendering effect, in the case that the specular reflection area comprises a mirror image of at least a partial area in the target image area, the specular reflection area in the image may be determined; determining attitude information of the specular reflection area based on attitude information when the image acquisition device acquires the image; sampling texture regions corresponding to the specular reflection regions from the texture map based on pose information of the specular reflection regions; rendering a texture region corresponding to the specular reflection region into the specular reflection region. As shown in fig. 9, the specular reflection area may be a window area in a house area, and taking the window area 901 as an example, the texture area corresponding to the area is an area where some stars are located, so that the area where the stars are located may be rendered into the window area 901, thereby further improving the reality of the rendering effect.

The general flow of the present disclosure will be described with reference to a specific embodiment, in which the image capturing device is a mobile phone, and the mobile phone may include a camera, and the camera is used for capturing a video stream in real time. The target image area is a sky area, and a three-dimensional graph generated through a texture map is called a sky box. The overall flow is as follows:

(1) and processing the image by using a deep neural network, identifying a sky area and a non-sky area, and generating a sky mask. In order to achieve a smooth video experience, a detection speed of 30 frames per second is generally required to be achieved, so that the detection of an image sky region and the generation of a sky mask map are required to be completed within 33ms, and the accuracy of a sky boundary is ensured as much as possible.

(2) The attitude information of the mobile phone is acquired in real time through hardware such as a sensor, and specifically, a Model-View-Projection (MVP) matrix of the mobile phone can be calculated by using a quaternion of the attitude of the mobile phone. Wherein, through the MVP matrix, the object can be converted to different coordinate systems so as to adjust the posture of the object. For example, the object may be mapped from a local coordinate system to a standardized Device Coordinates (NDC). The NDC coordinate system is a standardized device coordinate system, and the coordinate ranges of the three axes x, y and z are all-1 to 1.

(3) Sticker assets (i.e., texture maps, also known as sky-box material, used to generate sky-boxes) are loaded, and cube maps (Cubemaps) are created based on the sticker assets. In this step, the camera is wrapped in a cubic scene based on the cube map of OpenGL, and the cube map simulates a scene through 6 texture maps. The sky box is a 6-face cube, corresponding textures are respectively pasted on 6 faces, the origin of a coordinate system (which accords with the right-hand spiral rule) is located at the center of the box body, the camera is located at the origin of the coordinate, then a direction vector is calculated based on pose information of the camera, and then the sky box textures are sampled by using the direction vector, so that the sky box effect (namely a target texture area) can be obtained. A cube map is a special map form, comprising six 2D texture maps, each 2D texture map being a face of the cube map.

(4) The method comprises the steps of obtaining a current oriented target texture area through a mobile phone posture, specifically calculating a sampling direction in a vertex shader, sampling a cube map in a fragment shader based on the sampling direction, and outputting a sky box result texture (namely a pixel value of the target texture area).

In a Graphics Processing Unit (GPU), an image is called a texture after being loaded into a memory, and a pixel value (also called a color value) of the texture can be obtained by specifying an image position by an (x, y, z) coordinate, where the process of obtaining the pixel value is sampling. This is analogous to sampling from a continuous audio signal at regular times (say 1s 30 times) and then saving the resulting discrete content into a file similar to an audio file. The coordinates of the image in the NDC coordinate system are continuous, but the coordinates of the mobile phone screen are discrete, so that obtaining the color value of a certain pixel point is similar to the sampling process.

The center of a general coordinate system is located at the center of a cube map, sampling of the cube map is that a direction vector is calculated from an original point, the intersection point of the direction vector and a texture map is a sampling position, the direction of the direction vector is a sampling direction, and obtaining the pixel value of the texture map from the sampling position is cube map sampling.

And fusing the target texture area and the image shot by the camera based on the sky mask to obtain the final rendering effect.

The embodiment of the disclosure combines the image segmentation and real-time rendering technology of deep learning, and achieves real-time special effect rendering of a real scene and a virtual scene. The image segmentation result is quick and accurate, for example, night sky of different scenes such as day and night can be identified in a sky segmented scene, the boundary transition effect is good, the performance is high, and the method can be used in the field of real-time rendering. In addition, different segmentation networks can be selected to be mixed with the sky box, and different special effects are achieved. For example, a sky segmentation network may be used to segment a sky region in the image, or a water surface segmentation network may be used to segment a water surface region in the image, and so on.

The disclosure relates to the field of augmented reality, and aims to detect or identify relevant features, states and attributes of a target object by means of various visual correlation algorithms by acquiring image information of the target object in a real environment, so as to obtain an AR effect combining virtual and reality matched with specific applications. For example, the target object may relate to a face, a limb, a gesture, an action, etc. associated with a human body, or a marker, a marker associated with an object, or a sand table, a display area, a display item, etc. associated with a venue or a place. The vision-related algorithms may involve visual localization, SLAM, three-dimensional reconstruction, image registration, background segmentation, key point extraction and tracking of objects, pose or depth detection of objects, and the like. The specific application can not only relate to interactive scenes such as navigation, explanation, reconstruction, virtual effect superposition display and the like related to real scenes or articles, but also relate to special effect treatment related to people, such as interactive scenes such as makeup beautification, limb beautification, special effect display, virtual model display and the like. The detection or identification processing of the relevant characteristics, states and attributes of the target object can be realized through the convolutional neural network. The convolutional neural network is a network model obtained by performing model training based on a deep learning framework.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

As shown in fig. 10, the present disclosure also provides an image rendering apparatus, the apparatus including:

a first determining module 1001, configured to determine a target image area from an image captured by an image capturing device;

a second determining module 1002, configured to determine posture information of the image capturing apparatus when capturing the image;

a sampling module 1003, configured to sample a target texture region from a pre-obtained texture map based on the pose information; different attitude information corresponds to different target texture areas;

a rendering module 1004 for rendering the target texture region into the target image region.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again.

Embodiments of the present specification also provide a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method according to any of the foregoing embodiments when executing the program.

Fig. 11 is a more specific hardware structure diagram of a computing device provided in an embodiment of the present specification, where the device may include: a processor 1101, a memory 1102, an input/output interface 1103, a communication interface 1104, and a bus 1105. Wherein the processor 1101, memory 1102, input/output interface 1103, and communication interface 1104 enable communication connections within the device with each other via bus 1105.

The processor 1101 may be implemented by a general purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification. The processor 1101 may also include a graphics card, which may be an Nvidia titan X graphics card or a 1080Ti graphics card, etc.

The Memory 1102 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1102 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1102 and called by the processor 1101 for execution.

The input/output interface 1103 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component within the device (not shown) or may be external to the device to provide corresponding functionality. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1104 is used for connecting a communication module (not shown in the figure) to realize communication interaction between the device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1105 includes a pathway to transfer information between various components of the device, such as processor 1101, memory 1102, input/output interface 1103, and communication interface 1104.

It should be noted that although the above-mentioned device only shows the processor 1101, the memory 1102, the input/output interface 1103, the communication interface 1104 and the bus 1105, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

Referring to fig. 12, an embodiment of the present disclosure further provides an image rendering system, including:

an image acquisition device 1201 for acquiring an image;

a processor 1202 for determining a target image region from the image; determining attitude information when the image acquisition device acquires the image; sampling a target texture area from a pre-acquired texture map based on the attitude information; different attitude information corresponds to different target texture areas; rendering the target texture region into the target image region.

The image capturing apparatus 1201 and the processor 1202 may be different functional modules on the same electronic device, or may be functional modules on two physically independent electronic devices. The image capturing device 1201 and the processor 1202 may communicate, for example, wirelessly, to transmit the image captured by the image capturing device 1201 to the processor 1202 for processing. The image captured by the image capturing device 1201 may be a single image or a plurality of video frames in a video. The functions of the processor 1202 are described in the foregoing method embodiments, and are not described herein again.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method of any of the foregoing embodiments.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.

The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the apparatus embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the description of the method embodiments for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present disclosure. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.

Claims

1. A method of image rendering, the method comprising:

determining a target image area from an image acquired by an image acquisition device;

determining attitude information when the image acquisition device acquires the image;

sampling a target texture area from a pre-acquired texture map based on the attitude information; different attitude information corresponds to different target texture areas;

rendering the target texture region into the target image region.

2. The method of claim 1, wherein the target image area is determined based on at least one of:

a semantic category of at least one image region in the image;

pixel location of at least one image region in the image;

attribute information of at least one object in the image.

3. The method according to claim 1 or 2, wherein the number of the target image areas is greater than 1, and each target image area corresponds to a texture map; the sampling of the target texture region from the pre-acquired texture map based on the pose information includes:

sampling target texture areas from texture maps corresponding to the target image areas based on the attitude information;

the rendering the target texture region into the target image region comprises:

and respectively rendering each target texture area into a corresponding target image area.

4. The method of any of claims 1 to 3, wherein the rendering the target texture region into the target image region comprises:

performing masking processing on other regions except the target image region in the image to obtain a mask image corresponding to the image; the mask image comprises a first mask region corresponding to the target image region and a second mask region corresponding to the other regions;

and rendering the pixel points of the target texture region into the first masking region, and rendering the pixel points of the other regions into the second masking region.

5. The method of any one of claims 1 to 4, further comprising:

generating a three-dimensional stereogram based on the texture map;

the sampling of the target texture region from the pre-acquired texture map based on the pose information includes:

and sampling a target texture region from the three-dimensional stereo map based on the attitude information.

6. The method of claim 5, wherein the number of texture maps is greater than 1; the generating of the three-dimensional stereogram based on the texture map comprises:

acquiring a predetermined three-dimensional graph, wherein the three-dimensional graph comprises a plurality of surfaces; marking identification information on each texture map in advance, wherein the identification information is used for determining the corresponding surface of the texture map and the three-dimensional graph;

and rendering each texture map to each surface of the three-dimensional graph respectively based on the identification information to obtain the three-dimensional map.

7. The method of claim 6,

the three-dimensional graph has fixed attitude information; the sampling of the target texture region from the pre-acquired texture map based on the pose information includes:

sampling a target texture region from a pre-acquired texture map based on a difference between the pose information and the fixed pose information;

or

The attitude information of the three-dimensional graph dynamically changes; the sampling of the target texture region from the pre-acquired texture map based on the pose information includes:

and sampling a target texture region from a pre-acquired texture map based on the difference between the attitude information and the attitude information of the three-dimensional graph when the image acquisition device acquires the image.

8. The method according to claim 6 or 7, wherein the texture map corresponding to at least one face of the three-dimensional figure comprises a plurality of different texture maps in a texture map sequence.

9. The method of any one of claims 1 to 8, wherein the image comprises a plurality of video frames in a video; the sampling of the target texture region from the pre-acquired texture map based on the pose information includes:

sampling a target texture area corresponding to each video frame from a pre-acquired texture map based on the attitude information of each video frame acquired by the image acquisition device;

the rendering the target texture region into the target image region comprises:

and rendering the target texture area corresponding to each video frame into the corresponding video frame.

10. The method according to claim 9, wherein the sampling a target texture region corresponding to each video frame from a pre-obtained texture map based on the pose information of each video frame acquired by the image acquisition device comprises:

under the condition that the attitude information of the image acquisition device when acquiring any target video frame in the plurality of video frames is different from the attitude information when acquiring the previous video frame, re-sampling a target texture area from the texture map based on the attitude information when acquiring the target video frame; and/or

And under the condition that the attitude information when the image acquisition device acquires any target video frame in the plurality of video frames is the same as the attitude information when the image acquisition device acquires the previous video frame, determining the target texture area corresponding to the previous video frame as the target texture area corresponding to the target video frame.

11. The method of any one of claims 1 to 10, wherein the image includes a specular reflection region; the specular reflection area comprises a mirror image of at least part of the target image area;

the method further comprises the following steps: determining attitude information of the specular reflection area based on attitude information when the image acquisition device acquires the image;

sampling texture regions corresponding to the specular reflection regions from the texture map based on pose information of the specular reflection regions;

rendering a texture region corresponding to the specular reflection region into the specular reflection region.

12. An image rendering apparatus, characterized in that the apparatus comprises:

the first determining module is used for determining a target image area from an image acquired by the image acquisition device;

the second determining module is used for determining the attitude information when the image acquisition device acquires the image;

the sampling module is used for sampling a target texture area from a pre-acquired texture map based on the attitude information; different attitude information corresponds to different target texture areas;

a rendering module to render the target texture region into the target image region.

13. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 11.

14. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 11 when executing the program.