WO2024087971A1 - 用于图像处理的方法、装置及存储介质 - Google Patents

用于图像处理的方法、装置及存储介质 Download PDF

Info

Publication number
WO2024087971A1
WO2024087971A1 PCT/CN2023/120794 CN2023120794W WO2024087971A1 WO 2024087971 A1 WO2024087971 A1 WO 2024087971A1 CN 2023120794 W CN2023120794 W CN 2023120794W WO 2024087971 A1 WO2024087971 A1 WO 2024087971A1
Authority
WO
WIPO (PCT)
Prior art keywords
color image
full
pixel
image
round
Prior art date
Application number
PCT/CN2023/120794
Other languages
English (en)
French (fr)
Inventor
刘智超
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Publication of WO2024087971A1 publication Critical patent/WO2024087971A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the present application relates to the field of information technology, and in particular to a method, device and storage medium for image processing.
  • the screen refresh rate and rendering resolution of video playback are gradually increasing.
  • the screen refresh rate is increased from 60Hz to 120Hz or even higher, and the rendering resolution is increased from 720p to 2k or even higher.
  • frame prediction technology is applied to the processing of video data.
  • Frame prediction technology refers to the technology of predicting and inserting predicted frames based on data such as the motion vector of the real frame.
  • the insertion of predicted frames can improve the frame rate of the displayed image, reduce the sense of freeze and reduce the power consumption of video rendering.
  • the motion vector of the pixel between the two real frames can be obtained by calculating the similarity between the pixels.
  • the position of the pixel in the predicted frame is predicted according to the motion vector, so that the pixel is placed in the corresponding position.
  • the industry is focusing on researching a method that can improve the picture quality and accuracy of the predicted frame while saving power consumption and computing power.
  • the embodiments of the present application provide a method, an apparatus, and a storage medium for image processing, which can improve the image quality of a generated prediction frame.
  • an embodiment of the present application proposes a method for image processing, the method comprising: performing at least one round of distortion processing on a color image of a first real frame according to motion vector information between a first real frame and a second real frame to obtain a first target color image of a predicted frame, the first target color image of the predicted frame including a blank area caused by the at least one round of distortion processing; wherein each round of distortion processing in the at least one round of distortion processing comprises: generating a full-screen grid layer covering the first real frame, the full-screen grid layer including a plurality of grids, each grid including a plurality of pixels; performing distortion processing on at least part of the grids in the full-screen grid layer according to the motion vector information; outputting a color image corresponding to each round of distortion processing according to the distorted full-screen grid layer and the color image of the first real frame; filling the blank area in the first target color image of the predicted frame to obtain a second target color image of the predicted frame.
  • the second target color image is a display
  • a full-screen grid layer covering the first real frame is generated during the generation process of the predicted frame, and according to the motion vector between the two real frames, a color image including a blank area is obtained by one or more rounds of distortion processing of the full-screen grid layer.
  • the at least one round of distortion processing includes multiple rounds of distortion processing, and the multiple rounds of distortion processing correspond to multiple distortion types respectively, and the multiple distortion types include at least one of the following: a distortion operation performed on a static object; a distortion operation performed on a dynamic object; a distortion operation performed on a static object in a distant view; a distortion operation performed on a dynamic object in a distant view; a distortion operation performed on a static object in a close view.
  • the at least one round of distortion processing includes multiple rounds of distortion processing, and resolutions of full-screen grid layers corresponding to at least two rounds of distortion processing in the multiple rounds of distortion processing are different.
  • the resolution of the full-screen grid layer can be flexibly selected according to the object of each round of distortion processing, so as to save computing power and power consumption and improve the computing efficiency of the prediction frame.
  • the method further includes: when the object processed in the i-th round of the at least one round of distortion processing is a static object in the distance, setting the resolution of the full-screen grid layer used in the i-th round of processing to a first resolution, where i is an integer greater than or equal to 1; when the object processed in the i-th round of the at least one round of distortion processing is a static object or a dynamic object in the foreground, setting the resolution of the full-screen grid layer used in the i-th round of processing to a second resolution, and the first resolution is lower than the second resolution.
  • the at least one round of distortion processing includes multiple rounds of distortion processing
  • the first round of distortion processing in the multiple rounds of distortion processing includes: generating a first full-screen grid layer covering the first real frame; according to the motion vector information of the static object, offsetting the grid vertices in the first full-screen grid layer to obtain a distorted first full-screen grid layer; according to the distorted first full-screen grid layer, outputting the pixels of the static object in the color image of the first real frame to the blank image to obtain a first color image.
  • outputting pixels of static objects in the color image of the first real frame to a blank image to obtain a first color image includes: judging whether a first pixel belongs to a static object or a dynamic object according to the full-screen semantic image of the first real frame, the first pixel being any pixel in the color image of the first real frame; in the case where the first pixel belongs to a static object, outputting the first pixel to a first position in the blank image to obtain the first color image, the first position being a corresponding position of the first pixel in the first full-screen grid layer after the distortion; in the case where the first pixel belongs to a dynamic object The first pixel is not output to the blank image.
  • the second round of distortion processing in the multiple rounds of distortion processing includes: generating a second full-screen grid layer covering the first real frame; offsetting the grid vertices in the second full-screen grid according to the motion vector information of the dynamic object to obtain a distorted second full-screen grid layer; based on the distorted second full-screen grid layer, outputting the pixels of the dynamic object in the color image of the first real frame to the first color image to obtain the first target color image.
  • the pixels of the dynamic objects in the color image of the first real frame are output to the first color image according to the distorted second full-screen grid layer to obtain the first target color image, including: judging whether the second pixel belongs to a static object or a dynamic object according to the full-screen semantic image of the first real frame, the second pixel being any pixel in the color image of the first real frame; when the second pixel belongs to a dynamic object, outputting the second pixel to a second position in the first color image to obtain the first target color image, the second position being the corresponding position of the second pixel in the distorted second full-screen grid layer; when the second pixel belongs to a static object, not outputting the second pixel to the first color image.
  • the at least one round of distortion processing includes multiple rounds of distortion processing
  • the first round of distortion processing in the multiple rounds of distortion processing includes: generating a first full-screen grid layer covering the first real frame; offsetting the grid vertices in the first full-screen grid layer according to the motion vector information of the static objects in the distant view to obtain the distorted first full-screen grid layer; based on the distorted first full-screen grid layer, outputting the pixels of the static objects in the distant view in the color image of the first real frame to the blank image to obtain the first color image.
  • the pixels of static objects in the distance in the color image of the first real frame are output to a blank image according to the distorted first full-screen grid layer to obtain a first color image, including: judging whether a first pixel belongs to a static object or a dynamic object according to the full-screen semantic image of the first real frame, the first pixel being any pixel in the color image of the first real frame; when the first pixel belongs to a static object, determining whether the depth of the first pixel is greater than a preset depth threshold according to the scene depth map of the first real frame; when the depth of the first pixel is greater than the preset depth threshold D, outputting the first pixel to a first position in the blank image to obtain the first color image, the first position being the corresponding position of the first pixel in the distorted first full-screen grid layer; when the depth of the first pixel is less than the preset depth threshold D, not outputting the first pixel to the blank image; when the first pixel belongs to a
  • the second round of distortion processing in the multiple rounds of distortion processing includes: generating a second full-screen grid layer covering the first real frame; offsetting the grid vertices in the second full-screen grid layer according to the motion vector information of the static objects in the foreground to obtain a distorted second full-screen grid layer; based on the distorted second full-screen grid layer, outputting the pixels of the static objects in the foreground in the color image of the first real frame to the first color image to obtain a second color image.
  • the third round of distortion processing in the multiple rounds of distortion processing includes: generating a third full-screen grid layer covering the first real frame; offsetting the grid vertices in the third full-screen grid layer according to the motion vector information of the dynamic object in the distance to obtain a distorted third full-screen grid layer; and outputting the pixels of the dynamic object in the color image of the first real frame to the second color image according to the distorted third full-screen grid layer to obtain a third color image.
  • the fourth round of distortion processing in the multiple rounds of distortion processing includes: generating a fourth full-screen grid layer covering the first real frame; offsetting the grid vertices in the fourth full-screen grid layer according to the motion vector information of the dynamic object in the foreground to obtain a distorted fourth full-screen grid layer; and outputting the pixels of the dynamic object in the color image of the first real frame to the third color image according to the distorted fourth full-screen grid layer to obtain the first target color image.
  • filling the blank area in the first target color image of the predicted frame to obtain the second target color image of the predicted frame includes: determining a threshold range to which a camera angle between the first real frame and the second real frame belongs; selecting a target pixel filling algorithm from at least two candidate pixel filling algorithms according to the threshold range to which the camera angle belongs; and filling the blank area in the first target color image according to the target pixel filling algorithm to obtain the second target color image.
  • a target pixel filling algorithm can be selected from a plurality of candidate pixel filling algorithms according to the threshold range to which the camera angle belongs, so as to fill the blank area. Since different pixel filling algorithms are suitable for different scenes, the pixel filling algorithm for filling the blank area can be flexibly selected according to the threshold of the camera angle, so as to improve the prediction accuracy and image quality of the display image of the prediction frame.
  • the target pixel filling algorithm is selected from at least two candidate pixel filling algorithms according to the threshold range to which the camera angle belongs, including: when the camera angle is less than a first angle threshold, selecting a first pixel filling algorithm as the target pixel filling algorithm, wherein the first pixel filling algorithm achieves pixel filling by sampling pixels of static objects around the pixels to be filled; when the camera angle is greater than the first angle threshold, selecting a second pixel filling algorithm as the target pixel filling algorithm, wherein the second pixel filling algorithm achieves pixel filling by sampling pixels of static objects and dynamic objects around the pixels to be filled.
  • an embodiment of the present application provides a method for image processing, comprising: obtaining motion vector information between a first real frame and a second real frame; based on the motion vector information, warping the color image of the first real frame to obtain a first target color image of a predicted frame, wherein the warping comprises: generating a full-screen grid layer covering the first real frame, the full-screen grid layer comprising a plurality of grids, each grid comprising a plurality of pixels; based on the motion vector information, warping the grids in the full-screen grid layer; and obtaining a distorted color image based on the distorted full-screen grid layer and the color image of the first real frame.
  • the distortion processing includes multiple rounds of distortion processing, and the multiple rounds of distortion processing correspond to multiple distortion types, respectively.
  • the multiple distortion types include at least one of the following: a distortion operation performed on a static object; a distortion operation performed on a dynamic object; a distortion operation performed on a static object in a distant view; a distortion operation performed on a dynamic object in a distant view; a distortion operation performed on a static object in a close view.
  • the distortion processing includes multiple rounds of distortion processing, and resolutions of full-screen grid layers corresponding to at least two rounds of distortion processing in the multiple rounds of distortion processing are different.
  • the method further includes: when the object processed in the i-th round in the multiple rounds of distortion processing is a static object in the distance, setting the resolution of the full-screen grid layer used in the i-th round of processing to a first resolution, where i is an integer greater than or equal to 1; when the object processed in the i-th round in the at least one round of distortion processing is a static object or a dynamic object in the foreground, setting the resolution of the full-screen grid layer used in the i-th round of processing to a second resolution, and the first resolution is lower than the second resolution.
  • the method further includes: filling a blank area in the first target color image of the prediction frame to obtain a second target color image of the prediction frame, wherein the second target color image is a display image of the prediction frame.
  • an embodiment of the present application provides an apparatus for image processing, the apparatus comprising: a processor and a memory; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory, so that the terminal device executes the method of the first aspect or the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor, the method of the first aspect or the second aspect is implemented.
  • an embodiment of the present application provides a computer program product, which includes a computer program.
  • the computer program When the computer program is executed, the computer executes the method of the first aspect or the second aspect.
  • an embodiment of the present application provides a chip, the chip including a processor, the processor being used to call a computer program in a memory to execute the method described in the first aspect or the second aspect.
  • FIG1 is a schematic diagram of the principle of generating a prediction frame according to an embodiment of the present application.
  • FIG. 2 is a software architecture block diagram of an apparatus 100 for image processing according to an embodiment of the present application
  • FIG3 is a schematic flow chart of a method for image processing in an embodiment of the present application.
  • FIG4 is a schematic diagram of a distortion process for a color image of a first real frame according to an embodiment of the present application
  • FIG5 is a schematic diagram of a distortion process for a color image of a first real frame according to another embodiment of the present application.
  • FIG6 is a schematic diagram of a flow chart of a first pixel filling algorithm according to an embodiment of the present application.
  • FIG7 is a schematic diagram of a flow chart of a second pixel filling algorithm according to an embodiment of the present application.
  • FIG8 is a schematic diagram of a flow chart of a rendering process according to an embodiment of the present application.
  • FIG9 is a schematic diagram of the hardware structure of an apparatus 900 for image processing according to an embodiment of the present application.
  • FIG10 is a schematic structural diagram of an apparatus 1000 for image processing according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the hardware structure of an apparatus 1100 for image processing according to another embodiment of the present application.
  • At least one refers to one or more
  • plural refers to two or more.
  • “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural.
  • the character “/” generally indicates that the previous and next associated objects are in an "or” relationship.
  • “At least one of the following items” or similar expressions refers to any combination of these items, including any combination of single items or plural items.
  • at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, c can be It can be single or multiple.
  • the "at" in the embodiment of the present application can be the instant when a certain situation occurs, or can be a period of time after a certain situation occurs, and the embodiment of the present application does not specifically limit this.
  • the display interface provided in the embodiment of the present application is only an example, and the display interface can also include more or less content.
  • FIG1 is a schematic diagram of the principle of generating a prediction frame in an embodiment of the present application.
  • a prediction frame may refer to a prediction image generated based on information such as motion vectors of two real frames.
  • the prediction frame may be used to be inserted between, before or after two real frames, and the embodiment of the present application does not limit this.
  • the above two real frames may generally refer to two consecutive real frames.
  • FIG1 is taken as an example of two real frames being a first real frame and a second real frame.
  • the two real frames Before generating the predicted frame, it is necessary to first obtain relevant information of the two real frames, such as the color image of the real frame, the motion vector diagram between the two real frames, the scene depth information of the real frame, the camera angle between the two real frames, etc.
  • This information can be obtained through the rendering process of the two real frames, for example, through the rendering instruction stream.
  • the position of the pixels of the color image of the real frame in the predicted frame can be predicted, and then the pixels are placed at the corresponding positions in the predicted frame to obtain the first target color image of the predicted frame.
  • some blank areas that cannot find a matching relationship between the two real frames may appear.
  • the pixels filling the blank areas can be calculated according to the pixel filling algorithm to obtain the color image of the predicted frame that is finally displayed.
  • the pixel filling algorithm refers to a process of restoring and reconstructing the internal pixels of the damaged area in the image.
  • the pixel filling algorithm may include an image inpainting algorithm, etc.
  • FIG. 2 is a block diagram of the software architecture of an apparatus 100 for image processing according to an embodiment of the present application.
  • the layered architecture divides the software system of the apparatus 100 into several layers, each layer having a clear role and division of labor.
  • the layers communicate with each other through software interfaces.
  • the software system can be divided into four layers, namely, application layer (applications), application framework layer (application framework), system library and hardware layer.
  • FIG. 2 only shows the functional modules related to the method for image processing according to an embodiment of the present application.
  • the apparatus 100 may include more or fewer functional modules, or some functional modules may be replaced by other functional modules.
  • the application layer may include a series of application packages. As shown in FIG2 , the application layer includes game applications, AR applications, VR applications, etc.
  • the application framework layer provides application programming interface (API) and programming framework for the application programs in the application layer.
  • the application framework layer includes some predefined functions. As shown in FIG2 , the application framework layer may include an application framework.
  • the application framework (FrameWork) may provide the application layer with encapsulation of numerous function modules and API interfaces that can be called.
  • the system library may include a rendering module, which can be used to implement three-dimensional graphics drawing, image rendering, synthesis and layer processing, etc.
  • the rendering module includes but is not limited to at least one of the following: open graphics library (OpenGL), open source computer vision library (OpenCV), and open computing language library (OpenCL).
  • the hardware layer includes memory and graphics processing unit (GPU).
  • Memory can be used to temporarily store the calculation data in the central processing unit (CPU) and the data exchanged with external storage such as hard disk.
  • the GPU is a processor that performs image and graphics related operations.
  • the process of the device 100 performing graphics processing using the rendering module can be completed by the GPU.
  • an application When an application (such as a game application) needs to play a video, it can send a rendering of the video frame to the application framework.
  • the application framework calls the rendering module to render the image.
  • the rendering module can use the GPU to render the image and obtain the rendered screen data, which can be stored in the memory.
  • the application framework can read the image data from the memory and send it to the cache area of the display screen to display the image.
  • the device 100 may refer to an electronic device with an image processing function.
  • the device 100 may include but is not limited to: a mobile phone, a tablet computer, a PDA, a laptop computer, a wearable device, a virtual reality (VR) device, an augmented reality (AR) device, etc., and the embodiment of the present application is not limited to this.
  • a mobile phone a tablet computer, a PDA, a laptop computer, a wearable device, a virtual reality (VR) device, an augmented reality (AR) device, etc.
  • VR virtual reality
  • AR augmented reality
  • an embodiment of the present application provides a method and apparatus for image processing, in which a full-screen grid layer covering a real frame is generated during the generation of the predicted frame, and based on the motion vector between the two real frames, a color image including a blank area is obtained by one or more rounds of distortion processing of the full-screen grid layer.
  • a color image including a blank area is obtained by one or more rounds of distortion processing of the full-screen grid layer.
  • Fig. 3 is a schematic flow chart of an image processing method in an embodiment of the present application. The method can be executed by the device 100 in Fig. 2. As shown in Fig. 3, the method includes the following contents.
  • each round of distortion processing in at least one round of distortion processing includes: generating a full-screen grid layer covering the first real frame, the full-screen grid layer includes multiple grids, and each grid includes multiple pixels; according to motion vector information, at least part of the grids in the full-screen grid layer are distorted; according to the distorted full-screen grid layer and the color image of the first real frame, the color image corresponding to each round of distortion processing is output.
  • the first real frame and the second real frame may include continuous real frames or discontinuous real frames.
  • the predicted frame may be inserted between, before or after the first real frame and the second real frame.
  • the first real frame may refer to a front frame image or a rear frame image of the two real frames.
  • the motion vector information between the first real frame and the second real frame can be obtained during the rendering process of the first real frame and the second real frame.
  • the motion vector information can include different types.
  • the motion vector information can include motion vector information of static objects and motion vector information of dynamic objects.
  • the motion vector information of dynamic objects can include motion vector information of ordinary dynamic objects and motion vector information of dynamic objects with skeletal animation.
  • the multiple rounds of distortion processing may correspond to multiple distortion types, and different distortion types perform distortion operations on different objects in the first real frame.
  • the distortion types include at least one of the following: a distortion operation performed on a static object; a distortion operation performed on a dynamic object; a distortion operation performed on a static object in a distant view; a distortion operation performed on a dynamic object in a distant view; a distortion operation performed on a static object in a close view; The distortion operation to perform on the object.
  • objects with the same attributes in the first real frame may be selected for distortion operation, for example, distortion operation may be performed on static objects, or distortion operation may be performed on dynamic objects.
  • each round of warping different objects in the image can be warped, and the predicted frame image is obtained based on the results of multiple warping.
  • each warping operation only one object is operated, and the rest are discarded, and the results of multiple warping are synthesized into a color image with blank areas. Warping operations based on different types of objects can improve the accuracy of the final color image, thereby further improving the image quality.
  • the above-mentioned full-screen grid layer can be used to cover the entire screen.
  • Each grid can be a square or a rectangle.
  • the density of the grids in the full-screen grid layer can be expressed by resolution. The more and denser the grids in the full-screen grid layer, the higher the resolution, and vice versa.
  • Each grid includes a number of pixels. The higher the resolution of the grid, the fewer pixels each grid includes.
  • the pixels included in each grid can be expressed as m ⁇ m, where m is an integer greater than 1. For example, the pixels included in each grid can be 6 ⁇ 6, or 8 ⁇ 8. That is, each grid includes 6 rows ⁇ 6 columns of pixels, or 8 rows ⁇ 8 columns of pixels.
  • multiple resolutions can be set for the full-screen grid layer, and the resolution of the full-screen grid layer generated in each round of distortion processing can be the same or different. This can be determined according to the object to be distorted in each round of distortion processing. For example, for distant images or static objects, a lower resolution of the full-screen grid layer can be set. For close-up images or dynamic objects, a higher resolution of the full-screen grid layer can be set. Since dynamic objects are in motion, close-up images are closer to the camera and are more affected by the prediction error of the motion vector. Therefore, a higher resolution is required for grid distortion to obtain a more accurate predicted image.
  • a close-up image refers to an object in the scene that is close to the camera
  • a distant image refers to an object in the scene that is far from the camera.
  • a camera refers to a virtual camera in the rendered scene, through which the three-dimensional scene can be observed from a certain angle and a certain direction to obtain a display image of the three-dimensional scene. By changing the angle and direction of the virtual camera, different display images of the three-dimensional scene can be seen.
  • the scene depth map of the first real frame can be used to determine whether the object of the distortion operation is a near-view image or a far-view image.
  • the scene depth map is used to indicate the depth information of the pixels in the first real frame. For example, if the depth of most pixels of the object of the distortion operation is greater than a preset threshold, the object is determined to be a far-view image. If the depth of most pixels of the object of the distortion operation is less than the preset threshold, the object is determined to be a near-view image.
  • the object of the distortion operation is a dynamic object or a static object based on the full-screen semantic image of the first real frame.
  • the full-screen semantic image may also be referred to as a mask image, which refers to an image used to mark different types of areas of the first real frame, and can be obtained by masking the first real frame.
  • the type of any pixel in the first real frame can be determined. For example, in the full-screen semantic image, the pixel value of the area where the static object is located can be marked as 0, the pixel value of the area where the ordinary dynamic object is located can be marked as 1, and the pixel value of the dynamic object with skeletal animation can be marked as 2.
  • the full-screen mesh layer corresponding to multiple rounds of warping operations may use a fixed resolution or a variable resolution. For example, a higher-resolution mesh is used in multiple rounds of warping operations. Or in order to save computing power and power consumption, the resolution to be used may be determined based on the different objects of each round of warping operations. For example, if the object of a round of warping is a distant view and a static object, a lower resolution may be used, and a higher resolution may be used in other cases.
  • the resolution of the full-screen grid layer used in the i-th round of processing is set to the first resolution, where i is an integer greater than or equal to 1; when the object of the i-th round of processing in at least one round of distortion processing is a static object or a dynamic object in the foreground, the resolution of the full-screen grid layer used in the i-th round of processing is set to the second resolution, and the first resolution is lower than the second resolution.
  • a fixed pixel filling algorithm may be used, or a target pixel filling algorithm may be selected from a plurality of candidate pixel filling algorithms according to a threshold range to which the camera angle belongs, to fill the blank area. Since different pixel filling algorithms are suitable for different scenes, the pixel filling algorithm for filling the blank area may be flexibly selected according to the threshold of the camera angle, so as to improve the prediction accuracy and image quality of the displayed image of the prediction frame.
  • a first candidate pixel filling algorithm and a second candidate pixel filling algorithm may be set, which are respectively applicable to situations where the camera angle is large and the camera angle is small.
  • the first pixel filling algorithm is selected as the target pixel filling algorithm, and the first pixel filling algorithm implements pixel filling by sampling the pixels of static objects around the pixels to be filled;
  • the second pixel filling algorithm is selected as the target pixel filling algorithm, and the second pixel filling algorithm implements pixel filling by sampling the pixels of static objects and dynamic objects around the pixels to be filled.
  • the specific value of the first angle threshold can be determined according to practice, and will not be repeated here.
  • the specific implementation of the pixel filling algorithm please refer to the relevant description in Figures 6 and 7 below.
  • each round of distortion processing in the at least one round of distortion processing targets different types of objects, and the number of distortion processing rounds can be determined according to the type of the object to be distorted.
  • FIG4 is a schematic diagram of a distortion process for a color image of a first real frame according to an embodiment of the present application.
  • the at least one round of distortion process includes two rounds of distortion process.
  • the first round of distortion process is a distortion process for static objects
  • the second round of distortion process is a distortion process for dynamic objects.
  • the embodiment of the present application does not limit the order of the objects to be distorted. It is also possible to first perform the distortion process for dynamic objects and then perform the distortion process for static objects.
  • the resolutions of the first full-screen grid layer and the second full-screen grid layer corresponding to the two rounds of distortion processing may be the same or different.
  • the resolution of the first full-screen grid layer and the resolution of the second full-screen grid layer are both 8 ⁇ 8.
  • the resolution of the first full-screen grid layer is lower than that of the second full-screen grid layer, for example, the resolution of the first full-screen grid layer is 16 ⁇ 16, and the resolution of the second full-screen grid layer is 8 ⁇ 8.
  • the following steps may be included:
  • the mesh vertices in the first full-screen mesh layer are offset to obtain a distorted first full-screen mesh layer.
  • the blank image may be a newly created blank image.
  • it may be determined whether each pixel in the color image of the first real frame belongs to a pixel of a static object according to the full-screen semantic image of the first real frame. If so, the pixel is output to the blank image. If not, the pixel is abandoned. That is, only the pixels of the static object are output to the blank image, while the pixels of other types of objects are abandoned.
  • the specific process of obtaining the first color image includes: judging whether the first pixel belongs to a static object or a dynamic object according to the full-screen semantic image of the first real frame, the first pixel being any pixel in the color image of the first real frame; when the first pixel belongs to a static object, outputting the first pixel to a first position in the blank image to obtain the first color image, the first position being a corresponding position of the first pixel in the distorted first full-screen grid layer; when the first pixel belongs to a dynamic object, not outputting the first pixel to the blank image.
  • the mesh vertices in the second full-screen mesh are offset to obtain a distorted second full-screen mesh layer.
  • the motion vector information of the dynamic object may include motion vector information of a common dynamic object and motion vector information of a dynamic object with skeletal animation.
  • each pixel in the color image of the first real frame belongs to a pixel of a dynamic object according to the full-screen semantic image of the first real frame, and if so, output the pixel to the first color image, and if not, abandon the pixel. That is, only the pixels of the dynamic object are output to the first color image, and pixels of other types of objects are abandoned.
  • the specific process of obtaining the first target color image includes: judging whether the second pixel belongs to a static object or a dynamic object according to the full-screen semantic image of the first real frame, the second pixel being any pixel in the color image of the first real frame; when the second pixel belongs to a dynamic object, outputting the second pixel to the second position in the first color image to obtain the first target color image, the second position being the corresponding position of the second pixel in the distorted second full-screen grid layer; when the second pixel belongs to a static object, not outputting the second pixel to the first color image.
  • FIG5 is a schematic diagram of another embodiment of the present application for warping a color image of a first real frame.
  • the at least one round of warping includes four rounds of warping.
  • the first round of warping is for a static object in a distant view
  • the second round of warping is for a static object in a near view.
  • the third round of warping is for a dynamic object in a distant view
  • the fourth round of warping is for a dynamic object in a near view.
  • the embodiment of the present application does not limit the order of the objects to be warped.
  • the order of the objects to be warped by the above-mentioned multiple rounds of warping can also be changed.
  • the resolutions of the first to fourth full-screen grid layers corresponding to the four rounds of distortion processing may be the same or different.
  • the resolution of the first full-screen grid layer may be lower than that of the remaining three full-screen grid layers.
  • Grid layer For example, the resolution of the first full-screen grid layer is 16 ⁇ 16, and the resolutions of the second to fourth full-screen grid layers are 8 ⁇ 8.
  • the first round of distortion processing is for the static object in the distant view.
  • the first round of distortion processing may include the following steps:
  • the mesh vertices in the first full-screen mesh layer are offset to obtain a distorted first full-screen mesh layer.
  • each pixel in the color image of the first real frame belongs to a pixel of a static object based on the full-screen semantic image of the first real frame. If so, it is further determined whether the pixel belongs to a distant image based on the scene depth map. If it belongs to a distant image, the pixel is output to the blank image. If it does not belong to a static object or does not belong to a distant image, the pixel is abandoned. That is, only the pixels of static objects in the distant view are output to the blank image, while the pixels of other types of objects are abandoned.
  • the first pixel For example, based on the full-screen semantic image of the first real frame, determine whether the first pixel belongs to a static object or a dynamic object, and the first pixel is any pixel in the color image of the first real frame; when the first pixel belongs to a static object, determine whether the depth of the first pixel is greater than a preset depth threshold D based on the scene depth map of the first real frame; when the depth of the first pixel is greater than the preset depth threshold D, output the first pixel to a first position in the blank image to obtain a first color image, and the first position is a corresponding position of the first pixel in the distorted first full-screen grid layer; when the depth of the first pixel is less than the preset depth threshold D, do not output the first pixel to the blank image; when the first pixel belongs to a dynamic object, do not output the first pixel to the blank image.
  • a preset depth threshold D based on the full-screen semantic image of the first real frame
  • the depth of the first pixel is greater than a preset depth threshold D, it indicates that the first pixel belongs to a distant image; if it is less than the preset depth threshold D, it indicates that the first pixel belongs to a near image.
  • the specific value of the preset depth threshold D can be determined according to practice.
  • the second round of distortion processing is for the static objects in the near scene.
  • the second round of distortion processing may include the following steps:
  • the mesh vertices in the second full-screen mesh layer are offset to obtain a distorted second full-screen mesh layer.
  • each pixel in the color image of the first real frame belongs to a pixel of a static object based on the full-screen semantic image of the first real frame. If so, it is further determined whether the pixel belongs to a close-up image based on the scene depth map. If it belongs to a close-up image, the pixel is output to the first color image. For a pixel that does not belong to a static object or a close-up image, the pixel is abandoned. That is, only the pixels of static objects in the close-up are output to the first color image, while the pixels of other types of objects are abandoned.
  • the third round of distortion processing is for the dynamic objects in the distant view.
  • the third round of distortion processing may include the following steps:
  • the mesh vertices in the third full-screen mesh layer are offset to obtain a distorted third full-screen mesh layer.
  • each pixel in the color image of the first real frame belongs to a pixel of a dynamic object according to the full-screen semantic image of the first real frame. If so, it is further determined whether the pixel belongs to a distant view image according to the scene depth map. If so, the pixel is output to the second color image. If not, the pixel is abandoned. That is, only pixels of dynamic objects in the distant view are output to the second color image, while pixels of other types of objects are abandoned.
  • the fourth round of distortion processing is for the distortion processing of dynamic objects in the near scene.
  • the fourth round of distortion processing may include the following steps:
  • the mesh vertices in the fourth full-screen mesh layer are offset to obtain a distorted fourth full-screen mesh layer.
  • each pixel in the color image of the first real frame belongs to a pixel of a dynamic object according to the full-screen semantic image of the first real frame. If so, it is further determined whether the pixel belongs to a near-field image according to the scene depth map. If so, the pixel is output to the third color image. If not, the pixel is abandoned. That is, only pixels of dynamic objects in the near-field are output to the third color image, while pixels of other types of objects are abandoned.
  • the pixel filling algorithm is introduced in conjunction with Figures 6 and 7. It should be noted that the principle of the pixel filling algorithm is: taking the current pixel to be filled in the first target color image of the prediction frame as the starting point P_0, sampling the color values of the pixels around the starting point P_0 multiple times according to the random sampling offset value offset, and taking the average according to the multiple sampling results to obtain the color value of the filled pixel.
  • the blank area can be filled by sampling the pixels of the static object, and the pixels of the dynamic object are abandoned.
  • the degree of change of the static object also increases, and only sampling the pixels of the static object may result in the inability to sample valid pixels. Therefore, in the second pixel filling algorithm, the pixels of the static object and the dynamic object can be sampled at the same time to fill the blank area.
  • Fig. 6 is a schematic flow chart of a first pixel filling algorithm according to an embodiment of the present application.
  • the first pixel filling algorithm in Fig. 6 is applicable to the case where the camera angle is small.
  • the first pixel filling algorithm includes the following contents.
  • the first pixel is any pixel in a blank area in the first target color image.
  • the random sampling offset sequence is used to provide a random sampling offset value offset so as to provide a sampling offset step and a sampling direction for each pixel sampling.
  • the sampling offset value can be expressed as ( ⁇ x, ⁇ y).
  • the sampling number N is used to specify the number of times the first pixel is sampled from the surrounding pixels, and N is an integer greater than 1.
  • i is an integer greater than or equal to 1. 1 ⁇ i ⁇ N.
  • sample the first target color image and the full-screen semantic image according to the sampling coordinates P_i. If the pixel representation of the sampled full-screen semantic image is a static pixel, the color value result of the pixel of the sampled first target color image is accumulated into color. If the sampled full-screen semantic image representation is a dynamic pixel, no operation is performed.
  • FIG6 is an exemplary illustration of the first pixel filling algorithm.
  • the steps in FIG6 may be appropriately modified or added or deleted, and the resulting solution still falls within the protection scope of the embodiments of the present application.
  • Fig. 7 is a flow chart of a second pixel filling algorithm according to an embodiment of the present application.
  • Fig. 7 is suitable for the case where the camera angle is large.
  • the second pixel filling algorithm includes the following contents.
  • the first pixel is any pixel in a blank area in the first target color image.
  • the random sampling offset sequence is used to provide a random sampling offset value offset, so as to provide a sampling offset step and a sampling direction for each pixel sampling.
  • the sampling offset value can be expressed as ( ⁇ x, ⁇ y).
  • the sampling number N is used to specify the number of times the first pixel is sampled from the surrounding pixels, and N is an integer greater than 1.
  • i is an integer greater than or equal to 1. 1 ⁇ i ⁇ N.
  • sample the first target color image and the full-screen semantic image according to the sampling coordinates P_i. If the pixel representation of the sampled full-screen semantic image is a static pixel, the color value result of the pixel of the sampled first target color image is accumulated into color. If the sampled full-screen semantic image representation is a dynamic pixel, no operation is performed.
  • FIG. 7 is an exemplary illustration of the second pixel filling algorithm.
  • the steps in FIG. 7 may be appropriately modified or added or deleted, and the resulting solution still falls within the protection scope of the embodiments of the present application.
  • Fig. 8 is a flowchart of a rendering process according to an embodiment of the present application.
  • Fig. 8 takes rendering of a display interface in a game application as an example for illustration. As shown in Fig. 8, the method includes the following contents.
  • the game application sends a rendering instruction stream to the rendering module, where the rendering instruction stream is used to instruct the rendering of video frames in the game.
  • the rendering module can obtain user interface (UI) instructions during the process of processing the rendering instruction stream.
  • UI user interface
  • the rendering module can identify instructions for drawing the game UI based on the game rendering instruction characteristics.
  • UI instructions include but are not limited to: resource binding instructions, state setting instructions and drawing instructions, etc. These instructions are saved in the instruction stream buffer for subsequent use.
  • the rendering module calls the GPU to draw the real frame according to the rendering instruction stream.
  • each rendering instruction can be distinguished as static or dynamic according to the game rendering instruction characteristics, material information, and instruction calling rules. If it is the drawing of a static object, the stencil value of the rendering instruction is set to 0. The stencil value represents the template value. If it is the drawing of an ordinary dynamic object, the stencil value of the rendering instruction is set to 1. If it is the drawing of a dynamic object with skeletal animation, the three-dimensional space coordinate information of the model of two consecutive frames is saved in a buffer for subsequent motion vector calculation, and the stencil value of the rendering instruction is set to 2.
  • a full-screen semantic map can be generated, in which the pixel value covered by static objects is 0, the pixel value covered by ordinary dynamic objects is 1, and the pixel value of dynamic objects with skeletal animation is 2.
  • strategy 1 may be executed. If the camera angle is greater than or equal to the third preset angle, strategy 2 may be executed. If the camera angle is between the second preset angle and the third preset angle, strategy 3 may be executed.
  • the third preset angle is greater than the second preset angle, and the specific values of the second preset angle and the third preset angle can be determined according to practice and are not limited here.
  • the second preset angle can be 2° and the third preset angle can be 6°.
  • Strategy 1 includes steps S805 to S808:
  • the embodiment of the present application does not limit the specific method of obtaining the motion vector map.
  • different calculation methods can be used to obtain the motion vectors of dynamic objects with skeletal animation, the motion vectors of ordinary dynamic objects, and the motion vectors of static objects respectively, and the above three types of motion vectors can be merged together to obtain a motion vector map.
  • the accuracy of the motion vector of each type of object can be improved, thereby improving the accuracy of the merged motion vector map.
  • the motion vector can be calculated using the scene depth map and the change matrix of the virtual camera.
  • the normalized device coordinates can be constructed by sampling the depth value of the corresponding position of the scene depth map and the image coordinates of the sampling position. After obtaining the NDC coordinates, the image coordinates of the world space coordinates of the current pixel on the predicted frame image are calculated through the camera space matrix and projection matrix between the two real frames, thereby obtaining the pixel-by-pixel motion vector data.
  • motion vectors can be calculated based on their models.
  • each model rendering instruction with skeletal animation can be marked from the rendering instruction stream of the real frame.
  • Each model with skeletal animation has a corresponding transform feedback buffer. Modify the rendering algorithm of the skeletal animation model of the real frame and add the function of saving the NDC coordinates of each model vertex after coordinate transformation into the corresponding transform feedback buffer. To the image space coordinates of each vertex of the skeletal animation model in two real frames.
  • the motion vector can be estimated by sampling the color value of the corresponding position of the color image of the first real frame and the color value distribution of a certain area around it, and searching for the image data closest to the image distribution on the color image of the second real frame through an iterative optimization matching algorithm.
  • the coordinates in the image space are calculated respectively, and the motion vector of each vertex is calculated by difference, and then rasterized and interpolated into pixel data and rendered on the motion vector image.
  • the motion vector is calculated using the color image, the semantic image and the depth image. For each pixel on the motion vector image to be calculated, the data at the corresponding position on the semantic image is sampled to obtain the semantic information.
  • UI rendering includes: using the display image of the predicted frame as a rendering target, executing the UI rendering instruction stream stored in S801 on the rendering target, drawing UI information onto the display image of the predicted frame, and forming a complete video frame that can be used for display.
  • Strategy 2 includes steps S809 and S810:
  • Image multiplexing refers to outputting any one of the two real frames as the color image of the prediction frame.
  • Image mixing refers to mixing the images of the two real frames in a semi-transparent manner and outputting the mixed image as the color image of the prediction frame.
  • strategy 2 is applicable to situations where the camera angle between the two real frames is large. If the camera angle is greater than or equal to the third preset angle, it means that the current game screen is in a state of significant shaking, so the information required for the prediction frame calculation is too missing, and the result generated by the algorithm will be inaccurate. Therefore, image multiplexing or image mixing can be used to obtain the prediction frame.
  • UI rendering includes: using the display image of the predicted frame as a rendering target, executing the UI rendering instruction stream stored in S801 on the rendering target, drawing UI information onto the display image of the predicted frame, and forming a complete video frame that can be used for display.
  • Strategy 3 includes step S811:
  • the apparatus 900 for image processing may include a processor 110, an external memory interface 120, an internal memory 121, a sensor module 180, and a display screen 194.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the apparatus 900 for image processing.
  • the apparatus 900 for image processing may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently.
  • the components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a graphics processing unit (GPU), an image signal processor (ISP), a video codec, etc. Different processing units may be independent devices or integrated into one or more processors.
  • AP application processor
  • GPU graphics processing unit
  • ISP image signal processor
  • video codec video codec
  • the interface connection relationship between the modules illustrated in the embodiment of the present application is a schematic illustration and does not constitute a structural limitation on the device for image processing 900.
  • the device for image processing 900 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the device 900 implements the display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, display videos, and receive sliding operations, etc.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc.
  • the device 900 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the image processing device 900.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and videos are stored in the external memory card.
  • the internal memory 121 can be used to store computer executable program codes, and the executable program codes include instructions.
  • the internal memory 121 can include a program storage area and a data storage area.
  • the program storage area can store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
  • the data storage area can store data created during the use of the device 900 for image processing (such as audio data, a phone book, etc.), etc.
  • the sensor module 180 may include a pressure sensor, a touch sensor, and the like.
  • Fig. 10 is a schematic diagram of a structure of an apparatus 1000 for image processing provided in an embodiment of the present application, wherein the apparatus 1000 may include an electronic device, or may also include a chip or chip system in an electronic device.
  • the apparatus 1000 includes a functional unit for executing the method for image processing in Figs. 3 to 8.
  • the device 1000 includes a twisting unit 1010 and a filling unit 1020 .
  • the distortion unit 1010 is used to perform at least one round of distortion processing on the color image of the first real frame according to the motion vector information between the first real frame and the second real frame to obtain the first target color image of the predicted frame, and the first target color image of the predicted frame includes a blank area caused by at least one round of distortion processing.
  • Each round of distortion processing in the at least one round of distortion processing includes: generating a full-screen grid layer covering the first real frame, the full-screen grid layer includes a plurality of grids, and each grid includes a plurality of pixels; according to the motion vector information, at least part of the grids in the full-screen grid layer are distorted; according to the distorted full-screen grid layer and the color image of the first real frame, the color image corresponding to each round of distortion processing is output.
  • the filling unit 1020 is used to fill the blank area in the first target color image of the prediction frame to obtain the second target color image of the prediction frame, and the second target color image is the display image of the prediction frame.
  • Figure 11 is a schematic diagram of the hardware structure of another device 1100 for image processing provided in an embodiment of the present application.
  • the device 1100 includes a processor 1101, a communication line 1104 and at least one communication interface (the communication interface 1103 is exemplified in Figure 11).
  • Processor 1101 may include a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), a GPU, or one or more integrated circuits for controlling the execution of the program of the present application.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • GPU graphics processing unit
  • integrated circuits for controlling the execution of the program of the present application.
  • Communications link 1104 may include circuitry to transmit information between the above-described components.
  • the communication interface 1103 uses any transceiver-like device for communicating with other devices or communication networks, such as Ethernet, wireless local area networks (WLAN), etc.
  • the apparatus for image processing may further include a memory 1102 .
  • the memory 1102 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory may be independent and connected to the processor via a communication line 1104. The memory may also be integrated with the processor.
  • the memory 1102 is used to store computer-executable instructions for executing the solution of the present application, and the execution is controlled by the processor 1101.
  • the processor 1101 is used to execute the computer-executable instructions stored in the memory 1102, thereby implementing the method for video cropping provided in the embodiment of the present application.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application code, and the embodiments of the present application do not specifically limit this.
  • the processor 1101 may include one or more CPUs, and may also include one or more GPUs.
  • An embodiment of the present application provides an electronic device, which includes: a processor and a memory; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory, so that the electronic device performs the above-mentioned method for image processing.
  • the present application embodiment provides a chip.
  • the chip includes a processor, and the processor is used to call a computer program in a memory to execute the method for image processing in the above embodiment. Its implementation principle and technical effect are similar to those of the above related embodiments. The examples are similar and will not be repeated here.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the above method is implemented when the computer program is executed by the processor.
  • the method described in the above embodiment can be implemented in whole or in part by software, hardware, firmware, or any combination thereof. If implemented in software, the function can be stored as one or more instructions or codes on a computer-readable medium or transmitted on a computer-readable medium.
  • Computer-readable media can include computer storage media and communication media, and can also include any medium that can transfer a computer program from one place to another.
  • the storage medium can be any target medium that can be accessed by a computer.
  • a computer readable medium may include RAM, ROM, compact disc read-only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that is intended to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer.
  • any connection is appropriately referred to as a computer readable medium.
  • the software is transmitted from a website, server or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL) or wireless technologies such as infrared, radio and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium.
  • DSL digital subscriber line
  • Disk and optical disk as used herein include optical disk, laser disk, optical disk, digital versatile disk (DVD), floppy disk and Blu-ray disk, where disks usually reproduce data magnetically, while optical disks reproduce data optically using lasers. Combinations of the above should also be included in the scope of computer readable media.
  • An embodiment of the present application provides a computer program product, which includes a computer program.
  • the computer program When the computer program is executed, the computer executes the above method.
  • each process and/or box in the flowchart and/or block diagram and the combination of the process and/or box in the flowchart and/or block diagram can be realized by computer program instructions.
  • These computer program instructions can be provided to the processing unit of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable device to produce a machine, so that the instructions executed by the processing unit of the computer or other programmable data processing device produce a device for realizing the function specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

一种用于图像处理的方法、装置及存储介质,能够提高预测帧的画面质量。该方法包括:根据第一真实帧和第二真实帧之间的运动矢量信息,对第一真实帧的颜色图像进行至少一轮扭曲处理,以得到预测帧的第一目标颜色图像;其中,每轮扭曲处理包括:生成覆盖第一真实帧的全屏网格层;根据运动矢量信息,对全屏网格层中的至少部分网格进行扭曲处理;根据扭曲后的全屏网格层以及第一真实帧的颜色图像,输出每轮扭曲处理对应的颜色图像;对预测帧的第一目标颜色图像中的空白区域进行填充,得到预测帧的第二目标颜色图像,第二目标颜色图像为预测帧的显示图像。通过采用扭曲全屏网格层的方式得到预测帧的颜色图像,能够提高预测帧的精确程度。

Description

用于图像处理的方法、装置及存储介质
本申请要求于2022年10月26日提交中国国家知识产权局、申请号为202211319597.7、申请名称为“用于图像处理的方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息技术领域,尤其涉及用于图像处理的方法、装置及存储介质。
背景技术
随着图像处理技术的高速发展,在一些对视频画面的画质要求较高的场景中,比如游戏界面、虚拟现实(virtual reality,VR)界面、增强现实(Augmented Reality,AR)界面等,播放视频的屏幕刷新率和渲染分辨率在逐渐提高。例如,屏幕刷新率从60Hz提高到120Hz甚至更高,渲染分辨率从720p提高至2k甚至更高。这使得电子设备在性能和功耗上面临着更大的挑战,为了降低功耗,帧预测技术被应用到视频数据的处理过程中。
帧预测技术是指根据真实帧的运动矢量等数据预测并插入预测帧的技术,预测帧的插入能够提升显示图像的帧率,减少卡顿感以及降低视频渲染功耗。在生成预测帧的过程中,由于前后两个真实帧之间存在很多相似性,因此通过计算像素之间的相似程度可以获取像素在两个真实帧之间的运动矢量。根据运动矢量预测像素在预测帧中的位置,从而将像素放置于相应的位置。对于无法在两个真实帧之间确定匹配关系的像素区域,则需要根据像素填充算法预测填充该像素区域的图像数据。视频帧由于场景不同、环境不同以及画面风格不同,其具有的图像信息分布特征也不同,因此,若采用固定的预测帧生成方案,可能会出现预测帧的图像质量较差的问题。
为了解决上述问题,业界着力于研究一种能够在节省功耗和算力的基础上,提高预测帧的画面质量和精确度的方法。
发明内容
本申请实施例提供一种用于图像处理的方法、装置及存储介质,能够提高生成的预测帧的图像质量。
第一方面,本申请实施例提出一种用于图像处理的方法,该方法包括:根据第一真实帧和第二真实帧之间的运动矢量信息,对所述第一真实帧的颜色图像进行至少一轮扭曲处理,以得到预测帧的第一目标颜色图像,所述预测帧的第一目标颜色图像包括所述至少一轮扭曲处理导致的空白区域;其中,所述至少一轮扭曲处理中的每轮扭曲处理包括:生成覆盖所述第一真实帧的全屏网格层,所述全屏网格层中包括多个网格,每个网格中包括多个像素;根据所述运动矢量信息,对所述全屏网格层中的至少部分网格进行扭曲处理;根据扭曲后的所述全屏网格层以及所述第一真实帧的颜色图像,输出每轮扭曲处理对应的颜色图像;对所述预测帧的第一目标颜色图像中的空白区域进行填充,得到预测帧的第二目 标颜色图像,所述第二目标颜色图像为所述预测帧的显示图像。
这样,在预测帧的生成过程中生成覆盖第一真实帧的全屏网格层,并根据两个真实帧之间的运动矢量,通过对全屏网格层的一轮或者多轮扭曲处理,得到包括空白区域的颜色图像。通过采用扭曲全屏网格层的方式得到预测帧的颜色图像,而不是以像素为单位进行像素位移,能够精简计算资源以及提高预测帧的精确程度。
结合第一方面,在一种可能的实现方式中,所述至少一轮扭曲处理包括多轮扭曲处理,所述多轮扭曲处理分别对应于多种扭曲种类,所述多种扭曲种类包括以下至少一项:对静态物体执行的扭曲操作;对动态物体执行的扭曲操作;对远景中的静态物体执行的扭曲操作;对近景中的动态物体执行的扭曲操作;对远景中的动态物体执行的扭曲操作;对近景中的静态物体执行的扭曲操作。
这样,在每轮扭曲处理中,通过区分每轮处理的对象属于静态物体、动态物体、远景图像或者近景图像,对图像中的不同类型的对象进行扭曲,可以提高最终得到的颜色图像的精确程度,从而进一步提高图像质量。
结合第一方面,在一种可能的实现方式中,所述至少一轮扭曲处理包括多轮扭曲处理,所述多轮扭曲处理中的至少两轮扭曲处理对应的全屏网格层的分辨率不同。
这样,可以根据每轮扭曲处理的对象灵活选择全屏网格层的分辨率,达到节省算力和功耗,提高预测帧的计算效率的目的。
结合第一方面,在一种可能的实现方式中,所述方法还包括:在所述至少一轮扭曲处理中的第i轮处理的对象为远景中的静态物体的情况下,将所述第i轮处理中采用的全屏网格层的分辨率设置为第一分辨率,i为大于等于1的整数;在所述至少一轮扭曲处理中的第i轮处理的对象为近景中的静态物体或者动态物体的情况下,将所述第i轮处理中采用的全屏网格层的分辨率设置为第二分辨率,所述第一分辨率低于所述第二分辨率。
这样,对于远景中的静态物体,由于其在两个视频帧之间的变化量较小,因此可以使用分辨率更低的全屏网格层进行扭曲操作。而由于动态物体处于运动状态中,近景图像距离相机更近,受到运动矢量的预测误差的影响更大,因此需要采用较高的分辨率进行网格扭曲,才能获取更精确的预测图像。通过细化每轮扭曲操作的对象,并根据每轮扭曲处理的对象灵活选择全屏网格层的分辨率,达到节省算力和功耗,提高预测帧的计算效率的目的。
结合第一方面,在一种可能的实现方式中,所述至少一轮扭曲处理包括多轮扭曲处理,所述多轮扭曲处理中的第一轮扭曲处理包括:生成覆盖所述第一真实帧的第一全屏网格层;根据静态物体的运动矢量信息,对所述第一全屏网格层中的网格顶点进行偏移,得到扭曲后的第一全屏网格层;根据所述扭曲后的第一全屏网格层,将所述第一真实帧的颜色图像中的静态物体的像素输出至空白图像中,以得到第一颜色图像。
结合第一方面,在一种可能的实现方式中,所述将所述第一真实帧的颜色图像中的静态物体的像素输出至空白图像中,以得到第一颜色图像,包括:根据所述第一真实帧的全屏语义图像,判断第一像素是属于静态物体还是动态物体,所述第一像素为所述第一真实帧的颜色图像中的任一像素;在所述第一像素属于静态物体的情况下,将所述第一像素输出至所述空白图像中的第一位置中,以得到所述第一颜色图像,所述第一位置是所述第一像素在所述扭曲后的第一全屏网格层中的对应位置;在所述第一像素属于动态物体的情况 下,不向所述空白图像输出所述第一像素。
结合第一方面,在一种可能的实现方式中,所述多轮扭曲处理中的第二轮扭曲处理包括:生成覆盖所述第一真实帧的第二全屏网格层;根据动态物体的运动矢量信息,对所述第二全屏网格中的网格顶点进行偏移,得到扭曲后的第二全屏网格层;根据所述扭曲后的第二全屏网格层,将所述第一真实帧的颜色图像中的动态物体的像素输出至所述第一颜色图像中,以得到所述第一目标颜色图像。
结合第一方面,在一种可能的实现方式中,所述根据所述扭曲后的第二全屏网格层,将所述第一真实帧的颜色图像中的动态物体的像素输出至所述第一颜色图像中,以得到所述第一目标颜色图像,包括:根据所述第一真实帧的全屏语义图像,判断第二像素是属于静态物体还是动态物体,所述第二像素为所述第一真实帧的颜色图像中的任一像素;在所述第二像素属于动态物体的情况下,将所述第二像素输出至所述第一颜色图像中的第二位置中,以得到所述第一目标颜色图像,所述第二位置是所述第二像素在所述扭曲后的第二全屏网格层中的对应位置;在所述第二像素属于静态物体的情况下,不向所述第一颜色图像输出所述第二像素。
结合第一方面,在一种可能的实现方式中,所述至少一轮扭曲处理包括多轮扭曲处理,所述多轮扭曲处理中的第一轮扭曲处理包括:生成覆盖所述第一真实帧的第一全屏网格层;根据远景中的静态物体的运动矢量信息,对所述第一全屏网格层中的网格顶点进行偏移,得到扭曲后的第一全屏网格层;根据所述扭曲后的第一全屏网格层,将所述第一真实帧的颜色图像中的远景中的静态物体的像素输出至空白图像中,以得到第一颜色图像。
结合第一方面,在一种可能的实现方式中,所述根据所述扭曲后的第一全屏网格层,将所述第一真实帧的颜色图像中的远景中的静态物体的像素输出至空白图像中,以得到第一颜色图像,包括:根据所述第一真实帧的全屏语义图像,判断第一像素是属于静态物体还是动态物体,所述第一像素为所述第一真实帧的颜色图像中的任一像素;在所述第一像素属于静态物体的情况下,根据所述第一真实帧的场景深度图,确定所述第一像素的深度是否大于预设深度阈值;在所述第一像素的深度大于所述预设深度阈值D的情况下,将所述第一像素输出至所述空白图像中的第一位置中,以得到所述第一颜色图像,所述第一位置是所述第一像素在所述扭曲后的第一全屏网格层中的对应位置;在所述第一像素的深度小于所述预设深度阈值D的情况下,不向所述空白图像输出所述第一像素;在所述第一像素属于动态物体的情况下,不向所述空白图像输出所述第一像素。
结合第一方面,在一种可能的实现方式中,所述多轮扭曲处理中的第二轮扭曲处理包括:生成覆盖所述第一真实帧的第二全屏网格层;根据近景中的静态物体的运动矢量信息,对所述第二全屏网格层中的网格顶点进行偏移,得到扭曲后的第二全屏网格层;根据所述扭曲后的第二全屏网格层,将所述第一真实帧的颜色图像中的近景中的静态物体的像素输出至所述第一颜色图像中,以得到第二颜色图像。
结合第一方面,在一种可能的实现方式中,所述多轮扭曲处理中的第三轮扭曲处理包括:生成覆盖所述第一真实帧的第三全屏网格层;根据远景中的动态物体的运动矢量信息,对所述第三全屏网格层中的网格顶点进行偏移,得到扭曲后的第三全屏网格层;根据所述扭曲后的第三全屏网格层,将所述第一真实帧的颜色图像中的动态物体的像素输出至所述第二颜色图像中,以得到第三颜色图像。
结合第一方面,在一种可能的实现方式中,所述多轮扭曲处理中的第四轮扭曲处理包括:生成覆盖所述第一真实帧的第四全屏网格层;根据近景中的动态物体的运动矢量信息,对所述第四全屏网格层中的网格顶点进行偏移,得到扭曲后的第四全屏网格层;根据所述扭曲后的第四全屏网格层,将所述第一真实帧的颜色图像中的动态物体的像素输出至所述第三颜色图像中,以得到所述第一目标颜色图像。
结合第一方面,在一种可能的实现方式中,所述对所述预测帧的第一目标颜色图像中的空白区域进行填充,得到预测帧的第二目标颜色图像,包括:确定所述第一真实帧和第二真实帧之间的相机夹角所属的阈值范围;根据所述相机夹角所属的阈值范围,从至少两个候选像素填充算法中选择目标像素填充算法;根据所述目标像素填充算法,对所述第一目标颜色图像中的空白区域进行填充,以得到所述第二目标颜色图像。
这样,在对第一目标颜色图像中的空白区域进行填充时,可以根据相机夹角所属的阈值范围,从多个候选像素填充算法中选择目标像素填充算法,以进行空白区域的填充。由于不同的像素填充算法适合于不同的场景,因此,根据相机夹角的阈值灵活地选择进行空白区域填充的像素填充算法,能够提高预测帧的显示图像的预测精确程度和图像质量。
结合第一方面,在一种可能的实现方式中,所述根据所述相机夹角所属的阈值范围,从至少两个候选像素填充算法中选择目标像素填充算法,包括:在所述相机夹角小于第一夹角阈值的情况下,选择第一像素填充算法为所述目标像素填充算法,其中,所述第一像素填充算法通过采样待填充像素周围的静态物体的像素实现像素填充;在所述相机夹角大于所述第一夹角阈值的情况下,选择第二像素填充算法为所述目标像素填充算法,其中,所述第二像素填充算法通过采样待填充像素周围的静态物体以及动态物体的像素实现像素填充。
第二方面,本申请实施例提供一种用于图像处理的方法,包括:获取第一真实帧和第二真实帧之间的运动矢量信息;根据所述运动矢量信息,对所述第一真实帧的颜色图像进行扭曲处理,以得到预测帧的第一目标颜色图像,其中,所述扭曲处理包括:生成覆盖所述第一真实帧的全屏网格层,所述全屏网格层中包括多个网格,每个网格中包括多个像素;根据所述运动矢量信息,对所述全屏网格层中的网格进行扭曲处理;根据扭曲后的所述全屏网格层以及所述第一真实帧的颜色图像,得到扭曲后的颜色图像。
结合第二方面,在一种可能的实现方式中,所述扭曲处理包括多轮扭曲处理,述多轮扭曲处理分别对应于多种扭曲种类,所述多种扭曲种类包括以下至少一项:对静态物体执行的扭曲操作;对动态物体执行的扭曲操作;对远景中的静态物体执行的扭曲操作;对近景中的动态物体执行的扭曲操作;对远景中的动态物体执行的扭曲操作;对近景中的静态物体执行的扭曲操作。
结合第二方面,在一种可能的实现方式中,所述扭曲处理包括多轮扭曲处理,所述多轮扭曲处理中的至少两轮扭曲处理对应的全屏网格层的分辨率不同。
结合第二方面,在一种可能的实现方式中,所述方法还包括:在所述多轮扭曲处理中的第i轮处理的对象为远景中的静态物体的情况下,将所述第i轮处理中采用的全屏网格层的分辨率设置为第一分辨率,i为大于等于1的整数;在所述至少一轮扭曲处理中的第i轮处理的对象为近景中的静态物体或者动态物体的情况下,将所述第i轮处理中采用的全屏网格层的分辨率设置为第二分辨率,所述第一分辨率低于所述第二分辨率。
结合第二方面,在一种可能的实现方式中,所述方法还包括:对所述预测帧的第一目标颜色图像中的空白区域进行填充,得到预测帧的第二目标颜色图像,所述第二目标颜色图像为所述预测帧的显示图像。
第三方面,本申请实施例提供用于图像处理的装置,该装置包括:处理器和存储器;存储器存储计算机执行指令;处理器执行存储器存储的计算机执行指令,使得终端设备执行如第一方面或第二方面的方法。
第四方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序。计算机程序被处理器执行时实现如第一方面或第二方面的方法。
第五方面,本申请实施例提供一种计算机程序产品,计算机程序产品包括计算机程序,当计算机程序被运行时,使得计算机执行如第一方面或第二方面的方法。
第六方面,本申请实施例提供了一种芯片,芯片包括处理器,处理器用于调用存储器中的计算机程序,以执行如第一方面或第二方面所述的方法。
应当理解的是,本申请的第二方面至第六方面与本申请的第一方面的技术方案相对应,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。
附图说明
图1是本申请一实施例的生成预测帧的原理示意图;
图2是本申请一实施例的用于图像处理的装置100的软件架构框图;
图3是本申请实施例中的用于图像处理的方法的流程示意图;
图4是本申请一实施例的对第一真实帧的颜色图像进行扭曲处理的示意图;
图5是本申请又一实施例的对第一真实帧的颜色图像进行扭曲处理的示意图;
图6是本申请一实施例的第一像素填充算法的流程示意图;
图7是本申请一实施例的第二像素填充算法的流程示意图;
图8是本申请一实施例的进行渲染过程的流程示意图;
图9是本申请一实施例的用于图像处理的装置900的硬件结构示意图;
图10是本申请一实施例用于图像处理的装置1000的结构示意图;
图11是本申请又一实施例的用于图像处理的装置1100的硬件结构示意图。
具体实施方式
为了便于清楚描述本申请实施例的技术方案,本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可 以是单个,也可以是多个。
需要说明的是,本申请实施例中的“在……时”,可以为在某种情况发生的瞬时,也可以为在某种情况发生后的一段时间内,本申请实施例对此不作具体限定。此外,本申请实施例提供的显示界面仅作为示例,显示界面还可以包括更多或更少的内容。
图1是本申请一实施例的生成预测帧的原理示意图。如图1所示,预测帧可以指根据两个真实帧的运动矢量等信息生成的预测图像,预测帧可用于插入至两个真实帧之间、之前或者之后,本申请实施例对此不作限定。上述两个真实帧通常可以指连续的两个真实帧。图1中以两个真实帧为第一真实帧和第二真实帧为例进行说明。
在生成预测帧之前,首先需获取两个真实帧的相关信息,例如,真实帧的颜色图像、两个真实帧之间的运动矢量图、真实帧的场景深度信息、两个真实帧的相机夹角等信息等。这些信息可以通过对两个真实帧的渲染过程中获取,例如,可通过渲染指令流获取。
根据两个真实帧之间的运动矢量图,可以预测真实帧的颜色图像的像素在预测帧中的位置,然后将像素放置于预测帧中的相应位置,以得到预测帧的第一目标颜色图像。在根据运动矢量图移动像素的过程中,可能会出现一些无法在两个真实帧之间找到匹配关系的空白区域,对于这些空白区域,可以根据像素填充算法计算填充空白区域的像素,从而得到最终显示的预测帧的颜色图像。其中,像素填充算法是指一种将图像中的受损区域内部像素进行恢复和重建的过程。作为示例而非限定,像素填充算法可包括图像修复算法(image inpainting)等。
图2是本申请一实施例的用于图像处理的装置100的软件架构框图。分层架构将装置100的软件系统分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,可以将软件系统分为四层,分别为应用程序层(applications)、应用程序框架层(application framework)、系统库和硬件层。应理解,图2中仅示出与本申请实施例的用于图像处理的方法相关的功能模块,在实践中,装置100中可包括更多或更少的功能模块,或者部分功能模块也可以被其它功能模块替代。
应用程序层可以包括一系列应用程序包。如图2所示,应用程序层中包括游戏应用、AR应用、VR应用等。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。如图2所示,应用程序框架层可以包括应用框架。应用框架(FrameWork)可以为应用层提供封装众多可以调用的功能模块与API接口。
如图2所示,系统库中可包括渲染模块,渲染模块可用于实现三维图形绘图,图像渲染,合成和图层处理等。作为示例而非限定,渲染模块包括但不限于以下至少一项:开放图形库(open graphics library,OpenGL)、开源计算机视觉库(open source computer vision library,OpenCV)、开放运算语言库(open computing language library,OpenCL)。
硬件层包括内存(memory)、图形处理器(GPU)。内存可用于暂时存放中央处理器(central processing unit,CPU)中的运算数据,和与硬盘等外部存储器交换的数据。
GPU是执行图像和图形相关运算工作的处理器。在本申请实施例中,装置100利用渲染模块进行图形处理的过程可通过GPU完成。
在应用程序(例如游戏应用)需要播放视频时,可向应用框架发送发送视频帧的渲染 指令,应用框架调用渲染模块进行图像渲染,在渲染过程中,渲染模块可通过GPU对图像进行渲染计算,得到渲染后的画面数据,该画面数据可存储至内存中。在需要显示某一图像时,应用框架可以从内存中读取该图像的画面数据,并发送至显示屏的缓存区域,从而实现图像的显示。
可选地,在本申请实施例中,装置100可以指具有图像处理功能的电子设备。作为示例,装置100可包括但不限于:手机、平板电脑、掌上电脑、笔记本电脑、可穿戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备等,本申请实施例对此并不限定。
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以独立实现,也可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。
如前文所述,为了改善生成预测帧的显示图像质量问题,本申请实施例提供了一种用于图像处理的方法和装置,在该方法中,在预测帧的生成过程中生成覆盖真实帧的全屏网格层,并根据两个真实帧之间的运动矢量,通过对全屏网格层的一轮或者多轮扭曲处理,得到包括空白区域的颜色图像。通过采用扭曲全屏网格层的方式得到预测帧的颜色图像,能够提高预测帧的精确程度。进一步地,在每轮扭曲处理中,可以对图像中的不同类型的对象进行扭曲,并根据多次扭曲的结果得到预测帧的图像,进一步提高预测帧的精确程度和图像质量。
图3是本申请实施例中的图像处理的方法的流程示意图。该方法可以由图2中的装置100执行。如图3所示,该方法包括以下内容。
S301.根据第一真实帧和第二真实帧之间的运动矢量信息,对第一真实帧的颜色图像进行至少一轮扭曲处理,以得到预测帧的第一目标颜色图像,预测帧的第一目标颜色图像包括至少一轮扭曲处理导致的空白区域。
其中,至少一轮扭曲处理中的每轮扭曲处理包括:生成覆盖第一真实帧的全屏网格层,全屏网格层中包括多个网格,每个网格中包括多个像素;根据运动矢量信息,对全屏网格层中的至少部分网格进行扭曲处理;根据扭曲后的全屏网格层以及第一真实帧的颜色图像,输出每轮扭曲处理对应的颜色图像。
可选地,上述第一真实帧和第二真实帧可以包括连续的真实帧,也可以包括不连续的真实帧。上述预测帧可以插入第一真实帧和第二真实帧之间、之前或者之后。
可选地,上述第一真实帧可以指两个真实帧中的前帧图像,也可以指后帧图像。
在一些示例中,上述第一真实帧和第二真实帧之间的运动矢量信息可以在对第一真实帧和第二真实帧的渲染过程中获取。作为示例,上述运动矢量信息可包括不同的类型。例如,运动矢量信息可包括静态物体的运动矢量信息和动态物体的运动矢量信息。动态物体的运动矢量信息可包括普通动态物体的运动矢量信息和带骨骼动画的动态物体的运动矢量信息。
可选地,上述多轮扭曲处理可对应于多种扭曲种类,不同的扭曲种类针对第一真实帧中的不同对象进行扭曲操作。作为示例,上述扭曲种类包括以下至少一项:对静态物体执行的扭曲操作;对动态物体执行的扭曲操作;对远景中的静态物体执行的扭曲操作;对近景中的动态物体执行的扭曲操作;对远景中的动态物体执行的扭曲操作;对近景中的静态 物体执行的扭曲操作。
应理解,在每轮扭曲处理时,可选择第一真实帧中的相同属性的对象进行扭曲操作,例如,对静态物体进行扭曲操作,或者对动态物体进行扭曲操作等。
在每轮扭曲处理中,可以对图像中的不同对象进行扭曲,并根据多次扭曲的结果得到预测帧的图像,在每次扭曲操作中只针对一种对象进行操作,而舍弃其余部分,并经多次扭曲的结果合成为一个带有空白区域的颜色图像。根据不同种类的对象进行扭曲操作,可以提高最终得到的颜色图像的精确程度,从而进一步提高图像质量。
上述全屏网格层可以用于铺满整个屏幕画面。每个网格可以为正方形或者矩形。全屏网格层中的网格的密度可以用分辨率来表示。全屏网格层中的网格越多越密,则分辨率越高,反之则分辨率越低。每个网格中包括若干个像素,网格的分辨率越高,则每个网格中包括的像素个数越少。每个网格包括的像素可以用m×m表示,m为大于1的整数。例如,每个网格包括的像素可以为6×6,或者8×8。即每个网格中包括6行×6列像素,或者8行×8列像素。
可选地,可以为全屏网格层设置多种分辨率,每轮扭曲处理中生成的全屏网格层的分辨率可以相同,也可以不同。这可以根据每轮扭曲处理中进行扭曲操作的对象确定。例如,对于远景图像或者静态物体,可以设置较低的全屏网格层的分辨率。对于近景图像或者动态物体,可以设置较高的全屏网格层的分辨率。由于动态物体处于运动状态中,近景图像距离相机更近,受到运动矢量的预测误差的影响更大,因此需要采用较高的分辨率进行网格扭曲,才能获取更精确的预测图像。
其中,近景图像是指场景中的物体处于距离相机较近的位置,远景图像是指场景中的物体处于距离相机比较远的位置。在本申请实施例中,相机是指渲染场景中的虚拟相机,通过该虚拟相机可以从某一角度以及某一方向观察三维场景,以获取三维场景的显示画面。通过改变该虚拟相机的角度和朝向,可以看到三维场景的不同显示画面。
可选地,可以通过第一真实帧的场景深度图,判断扭曲操作的对象是近景图像还是远景图像。其中,场景深度图用于指示第一真实帧中的像素的深度信息。例如,若该扭曲操作的对象的大部分像素的深度大于某一预设阈值,则判断该对象为远景图像。若该扭曲操作的对象的大部分像素的深度小于该预设阈值,则判断该对象为近景图像。
可选地,可以根据第一真实帧的全屏语义图像,判断扭曲操作的对象是动态物体还是静态物体。其中,全屏语义图像也可以称为掩膜图像,是指用于标记第一真实帧的不同种类区域的图像,可以通过对第一真实帧进行掩膜遮挡得到。通过采样全屏语义图像,可以确定第一真实帧中的任一像素的种类。例如,在全屏语义图像中,可以将静态物体所在区域的像素值标记为0,将普通动态物体所在区域的像素值标记为1,将带骨骼动画的动态物体的像素值标记为2。
在一些示例中,在确定使用的全屏网格层的分辨率的方案中,多轮扭曲操作对应的全屏网格层可以使用固定分辨率,也可以使用可变分辨率。例如,在多轮扭曲操作中均使用分辨率较高的网格。或者为了节省算力和功耗,可以根据每轮扭曲操作的对象不同,确定使用的分辨率。例如,若一轮扭曲处理的对象为远景且为静态物体,则可以采用较低分辨率,其余情况下则采用较高分辨率。
例如,在至少一轮扭曲处理中的第i轮处理的对象为远景中的静态物体的情况下,将 第i轮处理中采用的全屏网格层的分辨率设置为第一分辨率,i为大于等于1的整数;在至少一轮扭曲处理中的第i轮处理的对象为近景中的静态物体或者动态物体的情况下,将第i轮处理中采用的全屏网格层的分辨率设置为第二分辨率,第一分辨率低于第二分辨率。
其中,关于每轮扭曲处理的具体实现,可参考下文中的图4和图5中的相关描述。
S302.对预测帧的第一目标颜色图像中的空白区域进行填充,得到预测帧的第二目标颜色图像,第二目标颜色图像为预测帧的显示图像。
可选地,在对第一目标颜色图像中的空白区域进行填充时,可以采用固定的像素填充算法,也可以根据相机夹角所属的阈值范围,从多个候选像素填充算法中选择目标像素填充算法,以进行空白区域的填充。由于不同的像素填充算法适合于不同的场景,因此,根据相机夹角的阈值灵活地选择进行空白区域填充的像素填充算法,能够提高预测帧的显示图像的预测精确程度和图像质量。
例如,确定第一真实帧和第二真实帧之间的相机夹角所属的阈值范围;根据相机夹角所属的阈值范围,从至少两个候选像素填充算法中选择目标像素填充算法;根据目标像素填充算法,对第一目标颜色图像中的空白区域进行填充,以得到第二目标颜色图像。
在一些示例中,可以设置第一候选像素填充算法和第二候选像素填充算法,分别适用于相机夹角较大和相机夹角较小的情形。例如,在相机夹角小于第一夹角阈值的情况下,选择第一像素填充算法为目标像素填充算法,第一像素填充算法通过采样待填充像素周围的静态物体的像素实现像素填充;在相机夹角大于第一夹角阈值的情况下,选择第二像素填充算法为目标像素填充算法,第二像素填充算法通过采样待填充像素周围的静态物体以及动态物体的像素实现像素填充。其中,第一夹角阈值的具体取值可根据实践确定,此处不再赘述。关于像素填充算法的具体实现可以参考下文中的图6和图7中的相关描述。
如前文,上述至少一轮扭曲处理中的每轮扭曲处理针对的对象的类型不同,可根据扭曲处理的对象类型,确定扭曲处理的轮数。接下来以图4和图5为例,描述本申请实施例中的对图像进行扭曲处理的具体过程。
图4是本申请一实施例的对第一真实帧的颜色图像进行扭曲处理的示意图。如图4所示,上述至少一轮扭曲处理包括两轮扭曲处理。其中,第一轮扭曲处理是针对静态物体的扭曲处理,第二轮扭曲处理是针对动态物体的扭曲处理。可选地,本申请实施例对扭曲处理的对象的次序不作限定。也可以先执行针对动态物体的扭曲处理,再执行针对静态物体的扭曲处理。
可选地,两轮扭曲处理中分别对应的第一全屏网格层和第二全屏网格层的分辨率可以相同,也可以不同。例如,在分辨率相同的情况下,第一全屏网格层的分辨率和第二全屏网格层的分辨率均为8×8。或者,在分辨率不同的情况下,第一全屏网格层的分辨率低于第二全屏网格层,例如,第一全屏网格层的分辨率为16×16,第二全屏网格层的分辨率为8×8。
具体地,在第一轮扭曲处理中,可以包括以下步骤:
S401.生成覆盖第一真实帧的第一全屏网格层。
S402.根据静态物体的运动矢量信息,对第一全屏网格层中的网格顶点进行偏移,得到扭曲后的第一全屏网格层。
其中,在网格扭曲过程中,根据静态物体的运动矢量信息只对网格顶点进行偏移拉伸, 而不是针对每个像素进行偏移拉伸,可以提高扭曲操作的计算效率。
S403.根据扭曲后的第一全屏网格层,将第一真实帧的颜色图像中的静态物体的像素输出至空白图像中,以得到第一颜色图像。
其中,该空白图像可以是新建空白图像。在获取第一颜色图像的过程中,可以根据第一真实帧的全屏语义图像,判断第一真实帧的颜色图像中的每个像素是否属于静态物体的像素,若属于,则将该像素输出至空白图像中,若不属于,则放弃该像素。即只向空白图像中输出静态物体的像素,而放弃其余类型的对象的像素。
例如,得到第一颜色图像的具体过程包括:根据第一真实帧的全屏语义图像,判断第一像素是属于静态物体还是动态物体,第一像素为第一真实帧的颜色图像中的任一像素;在第一像素属于静态物体的情况下,将第一像素输出至空白图像中的第一位置中,以得到第一颜色图像,第一位置是第一像素在扭曲后的第一全屏网格层中的对应位置;在第一像素属于动态物体的情况下,不向空白图像输出第一像素。
在第二轮扭曲处理中,可以包括以下步骤:
S404.生成覆盖第一真实帧的第二全屏网格层。
S405.根据动态物体的运动矢量信息,对第二全屏网格中的网格顶点进行偏移,得到扭曲后的第二全屏网格层。
其中,在网格扭曲过程中,根据动态物体的运动矢量信息只对网格顶点进行偏移拉伸,而不是针对每个像素进行偏移拉伸,可以提高扭曲操作的计算效率。
可选地,上述动态物体的运动矢量信息可包括普通动态物体的运动矢量信息和带骨骼动画的动态物体的运动矢量信息。
S406.根据扭曲后的第二全屏网格层,将第一真实帧的颜色图像中的动态物体的像素输出至第一颜色图像中,以得到第一目标颜色图像。
在获取第一目标颜色图像的过程中,可以根据第一真实帧的全屏语义图像,判断第一真实帧的颜色图像中的每个像素是否属于动态物体的像素,若属于,则将该像素输出至第一颜色图像中,若不属于,则放弃该像素。即只向第一颜色图像中输出动态物体的像素,而放弃其余类型的对象的像素。
例如,得到第一目标颜色图像的具体过程包括:根据第一真实帧的全屏语义图像,判断第二像素是属于静态物体还是动态物体,第二像素为第一真实帧的颜色图像中的任一像素;在第二像素属于动态物体的情况下,将第二像素输出至第一颜色图像中的第二位置中,以得到第一目标颜色图像,第二位置是第二像素在扭曲后的第二全屏网格层中的对应位置;在第二像素属于静态物体的情况下,不向第一颜色图像输出第二像素。
图5是本申请又一实施例的对第一真实帧的颜色图像进行扭曲处理的示意图。如图5所示,上述至少一轮扭曲处理包括四轮扭曲处理。其中,第一轮扭曲处理是针对远景中的静态物体的扭曲处理,第二轮扭曲处理是针对近景中的静态物体的扭曲处理。第三轮扭曲处理是针对远景中的动态物体的扭曲处理,第四轮扭曲处理是针对近景中的动态物体的扭曲处理。可选地,本申请实施例对扭曲处理的对象的次序不作限定。上述多轮扭曲处理针对的对象的次序也可以改变。
可选地,上述四轮扭曲处理对应的第一全屏网格层至第四全屏网格层的分辨率可以相同,也可以不同。在分辨率不同的情况下,第一全屏网格层的分辨率可低于其余三个全屏 网格层。例如,第一全屏网格层的分辨率为16×16,第二全屏网格层至第四全屏网格层的分辨率为8×8。
具体地,第一轮扭曲处理是针对远景中的静态物体的扭曲处理,在第一轮扭曲处理中,可以包括以下步骤:
S501.生成覆盖第一真实帧的第一全屏网格层。
S502.根据远景中的静态物体的运动矢量信息,对第一全屏网格层中的网格顶点进行偏移,得到扭曲后的第一全屏网格层。
S503.根据扭曲后的第一全屏网格层,将第一真实帧的颜色图像中的远景中的静态物体的像素输出至空白图像中,以得到第一颜色图像。
可选地,在获取第一颜色图像的过程中,可以根据第一真实帧的全屏语义图像,判断第一真实帧的颜色图像中的每个像素是否属于静态物体的像素,若属于,则进一步根据场景深度图判断该像素是否属于远景图像,若属于远景图像,则将该像素输出至空白图像中。若不属于静态物体或者不属于远景图像,则放弃该像素。即只向空白图像中输出远景中的静态物体的像素,而放弃其余类型的对象的像素。
例如,根据第一真实帧的全屏语义图像,判断第一像素是属于静态物体还是动态物体,第一像素为第一真实帧的颜色图像中的任一像素;在第一像素属于静态物体的情况下,根据第一真实帧的场景深度图,确定第一像素的深度是否大于预设深度阈值D;在第一像素的深度大于预设深度阈值D的情况下,将第一像素输出至空白图像中的第一位置中,以得到第一颜色图像,第一位置是第一像素在扭曲后的第一全屏网格层中的对应位置;在第一像素的深度小于预设深度阈值D的情况下,不向空白图像输出第一像素;在第一像素属于动态物体的情况下,不向空白图像输出第一像素。
可选地,若第一像素的深度大于预设深度阈值D,则表示该第一像素属于远景图像;若小于预设深度阈值D,则表示该第一像素属于近景图像。上述预设深度阈值D的具体赋值可以根据实践确定。
第二轮扭曲处理是针对近景中的静态物体的扭曲处理,在第二轮扭曲处理中,可以包括以下步骤:
S504.生成覆盖第一真实帧的第二全屏网格层。
S505.根据近景中的静态物体的运动矢量信息,对第二全屏网格层中的网格顶点进行偏移,得到扭曲后的第二全屏网格层。
S506.根据扭曲后的第二全屏网格层,将第一真实帧的颜色图像中的近景中的静态物体的像素输出至第一颜色图像中,以得到第二颜色图像。
可选地,在获取第二颜色图像的过程中,可以根据第一真实帧的全屏语义图像,判断第一真实帧的颜色图像中的每个像素是否属于静态物体的像素,若属于,则进一步根据场景深度图判断该像素是否属于近景图像,若属于近景图像,则将该像素输出至第一颜色图像中。对于不属于静态物体或者不属于近景图像的情形,则放弃该像素。即只向第一颜色图像中输出近景中的静态物体的像素,而放弃其余类型的对象的像素。
第三轮扭曲处理是针对远景中的动态物体的扭曲处理,在第三轮扭曲处理中,可以包括以下步骤:
S507.生成覆盖第一真实帧的第三全屏网格层。
S508.根据远景中的动态物体的运动矢量信息,对第三全屏网格层中的网格顶点进行偏移,得到扭曲后的第三全屏网格层。
S509.根据扭曲后的第三全屏网格层,将第一真实帧的颜色图像中的远景中的动态物体的像素输出至第二颜色图像中,以得到第三颜色图像。
在获取第三颜色图像的过程中,可以根据第一真实帧的全屏语义图像,判断第一真实帧的颜色图像中的每个像素是否属于动态物体的像素,若属于,则进一步根据场景深度图判断该像素是否属于远景图像,若属于远景图像,则将该像素输出至第二颜色图像中,若不属于,则放弃该像素。即只向第二颜色图像中输出远景中的动态物体的像素,而放弃其余类型的对象的像素。
第四轮扭曲处理是针对近景中的动态物体的扭曲处理,在第四轮扭曲处理中,可以包括以下步骤:
S510.生成覆盖第一真实帧的第四全屏网格层。
S511.根据近景中的动态物体的运动矢量信息,对第四全屏网格层中的网格顶点进行偏移,得到扭曲后的第四全屏网格层。
S512.根据扭曲后的第四全屏网格层,将第一真实帧的颜色图像中的近景中的动态物体的像素输出至第三颜色图像中,以得到第一目标颜色图像。
在获取第一目标颜色图像的过程中,可以根据第一真实帧的全屏语义图像,判断第一真实帧的颜色图像中的每个像素是否属于动态物体的像素,若属于,则进一步根据场景深度图判断该像素是否属于近景图像,若属于近景图像,则将该像素输出至第三颜色图像中,若不属于,则放弃该像素。即只向第三颜色图像中输出近景中的动态物体的像素,而放弃其余类型的对象的像素。
接下来结合图6和图7介绍像素填充算法。需要说明的是,像素填充算法的原理是:以预测帧的第一目标颜色图像中的待填充的当前像素为起始点P_0,根据随机的采样偏移值offset多次采样起始点P_0周围的像素的颜色值,并根据多次采样结果进行取均值,得到填充的像素的颜色值。在相机夹角较小的情况下,考虑到动态物体在视频帧之间变化程度较大,而静态物体的变化的程度较小,因此为了填充像素的准确性,可通过采样静态物体的像素填充空白区域,而放弃动态物体的像素。而在相机夹角较大的情况下,静态物体的变化程度也变大,仅通过采样静态物体的像素可能导致无法采样到有效像素,因此,第二像素填充算法中可同时采样静态物体和动态物体的像素,以填充空白区域。
图6是本申请一实施例的第一像素填充算法的流程示意图。其中,图6中的第一像素填充算法适用于相机夹角较小的情况。参见图6,第一像素填充算法包括以下内容。
S601.确定待填充的第一像素,将第一像素的坐标点确定为起始点P_0。
其中,第一像素为第一目标颜色图像中的空白区域中的任一像素。
S602.获取随机采样偏移序列和采样次数N。
其中,随机采样偏移序列用于提供随机的采样偏移值offset,以便于为每次像素采样提供采样的偏移步长和采样方向。例如,采样偏移值可表示为(Δx,Δy)。采样次数N用于规定针对第一像素采样其周围像素的次数,N为大于1的整数。
S603.根据随机采样偏移序列,对第一像素周围的像素进行N次采样,以获取N次采样的像素的颜色值之和color。
例如,在N次采样中的第i次采样中,可依次从随机采样序列中获取采样偏移值offset_i,计算采样坐标P_i=P_0+offset_i。i为大于等于1的整数。1≤i≤N。
根据采样坐标P_i采样第一目标颜色图像和全屏语义图像,如果采样到的全屏语义图像的像素表示是静态像素,则将采样第一目标颜色图像的像素的颜色值结果累加到color中。如果采样全屏语义语义图像表示是动态像素,则不做任何操作。
S604.获取N次采样的像素的颜色值的均值color’。
其中,均值color’可表示为:color’=color/N。
S605.输出N次采样的像素的颜色值的均值color’作为第一像素的颜色值。
需要说明的是,图6是第一像素填充算法的示例性说明,在实践中可对图6中的步骤进行适当的变形或者增删,得到的方案依然落入本申请实施例的保护范围。
图7是本申请一实施例的第二像素填充算法的流程示意图。图7适应于相机夹角较大的情况。参见图7,第二像素填充算法包括以下内容。
S701.确定待填充的第一像素,将第一像素的坐标点确定为起始点P_0。
其中,第一像素为第一目标颜色图像中的空白区域中的任一像素。
S702.获取随机采样偏移序列和采样次数N。
其中,随时采样偏移序列用于提供随机的采样偏移值offset,以便于为每次像素采样提供采样的偏移步长和采样方向。例如,采样偏移值可表示为(Δx,Δy)。采样次数N用于规定针对第一像素采样其周围像素的次数,N为大于1的整数。
S703.根据随机采样偏移序列,对第一像素周围的像素进行N次采样,以获取N次采样的像素的颜色值之和color,在N次采样中只采样静态物体的像素。
例如,在N次采样中的第i次采样中,可依次从随机采样序列中获取采样偏移值offset_i,计算采样坐标P_i=P_0+offset_i。i为大于等于1的整数。1≤i≤N。
根据采样坐标P_i采样第一目标颜色图像和全屏语义图像,如果采样到的全屏语义图像的像素表示是静态像素,则将采样第一目标颜色图像的像素的颜色值结果累加到color中。如果采样全屏语义语义图像表示是动态像素,则不做任何操作。
S704.若color不等于0,则输出N次采样的像素的颜色值的均值color’作为第一像素的颜色值。
其中,均值color’可表示为:color’=color/N。
S705.若color等于0,则重新对第一像素周围的像素进行M次采样,得到M次采样的颜色值之和color。在M次采样中可采样静态物体和动态物体的像素,M为大于1的整数。
S706.根据M次采样得到的颜色值之和color,获取M次采样的像素的颜色值的均值color’。
S707.输出M次采样的像素的颜色值的均值color’作为第一像素的颜色值。
需要说明的是,图7是第二像素填充算法的示例性说明,在实践中可对图7中的步骤进行适当的变形或者增删,得到的方案依然落入本申请实施例的保护范围。
图8是本申请一实施例的进行渲染过程的流程示意图。图8中以渲染游戏应用中的显示界面为例进行说明。如图8所示,该方法包括以下内容。
S801.游戏应用向渲染模块发送渲染指令流,上述渲染指令流用于指示渲染游戏中的视频帧。
其中,渲染模块在处理渲染指令流的过程中,可获取用户界面(user interface,UI)指令。例如,渲染模块可根据游戏渲染指令特性,识别用于绘制游戏UI的指令。UI指令包括但不限于:资源绑定指令,状态设置指令和绘制指令等。并将这些指令保存在指令流缓冲区中以备后续使用。
S802.渲染模块根据渲染指令流,调用GPU绘制真实帧。
S803.在绘制真实帧的过程中,获取两个真实帧的全屏语义图像、场景深度图以及颜色图。
其中,在绘制真实帧的过程中,可以根据游戏渲染指令特性,材质信息以及指令调用规律等,对每一个渲染指令进行动静判别。如果是静态物体的绘制,则设置该渲染指令的stencil value是0。stencil value表示模板值。如果是普通动态物体绘制,设置该渲染指令的stencil value是1。如果是带骨骼动画的动态物体绘制,则将连续两帧的模型三维空间坐标信息保存到一个缓冲区中,用于后续的运动矢量计算,同时设置该渲染指令的stencil value是2。
渲染指令完成之后,可生成一张全屏语义图,其中静态物体覆盖的像素值是0,普通动态物体覆盖的像素值是1,带骨骼动画的动态物体的像素值是2。
S804.根据两个真实帧的相机夹角,确定生成预测帧的策略。
其中,若相机夹角小于或等于第二预设夹角,则可以执行策略1。若相机夹角大于或等于第三预设夹角,则可以执行策略2。若相机夹角处于第二预设夹角和第三预设夹角之间,则执行策略3。
其中,第三预设夹角大于第二预设夹角,第二预设夹角和第三预设夹角的具体取值可根据实践确定,此处不作限定。作为示例,第二预设夹角可以为2°,第三预设夹角可以为6°。
策略1包括步骤S805~S808:
S805.获取两个真实帧之间的运动矢量图。
可选地,对于获取运动矢量图的具体方式,本申请实施例不作限定。
作为示例,在获取运动矢量图的过程中,可以利用不同的计算方法分别获取带骨骼动画的动态物体的运动矢量、普通动态物体的运动矢量、静态物体的运动矢量,并将以上三种类型的运动矢量合并在一起,得到运动矢量图。
通过不同的计算方法计算不同类型的对象的运动矢量,能够提高每种类型的对象的运动矢量的精确程度,进而提高合并后的运动矢量图的精确程度。
例如,对于静态物体,可以利用场景深度图和虚拟相机的变化矩阵计算运动矢量。具体地,可通过采样场景深度图对应位置的深度值和采样位置的图像坐标,构建标准化设备坐标(normalized device coordinates,NDC)。在获取NDC坐标之后,通过两个真实帧之间的相机空间矩阵和投影矩阵,计算出当前像素的世界空间坐标在预测帧图像上的图像坐标,从而获取逐像素的运动矢量数据。
对于带骨骼动画的动态物体,可以根据其模型计算运动矢量。例如,可以从真实帧的渲染指令流中标记出每个带骨骼动画的模型渲染指令,每个带骨骼动画的模型具有对应的变换反馈缓存(Transform feedback buffer)。修改真实帧的带骨骼动画模型的渲染算法,增加将坐标变换后的每个模型顶点的NDC坐标保存到对应的变换反馈缓存中的功能,则得 到在两个真实帧中的带骨骼动画模型的每个顶点的图像空间坐标。创建新的渲染指令,计算带骨骼动画模型的每个顶点的运动矢量,然后利用GPU的光栅化功能,对顶点覆盖的像素区域内的每个像素通过顶点运动矢量的线性插值的方式得到像素对应的运动矢量。
对于普通动态物体,可以通过采样第一真实帧的颜色图像对应位置的颜色值以及周围一定区域的颜色值分布,通过迭代优化的匹配算法在第二真实帧的颜色图像上寻找最接近该图像分布的图像数据,以估计出运动矢量。首先根据带骨骼动画的动态物体的顶点数据,分别计算在图像空间下的坐标,利用差值计算逐顶点的运动矢量,然后光栅化插值成像素数据渲染到运动矢量图像上。
利用颜色图像,语义图像以及深度图像计算运动矢量。对待计算的运动矢量图像上的每个像素,采样语义图像上对应位置的数据,获得语义信息。
S806.对第一真实帧进行至少一轮扭曲处理,得到预测帧的第一目标颜色图像。
其中,对于进行至少一轮扭曲处理的具体描述,可参考图3至图5中的相关内容,为了简洁,此处不再赘述。
S807.填充第一目标颜色图像中的空白区域,以得到预测帧的第二目标颜色图像,第二目标颜色图像是预测帧的显示图像。
其中,关于填充第一目标颜色图像中的空白区域的具体过程,可参考图3、图6以及图7中的相关内容,为了简洁,此处不再描述。
S808.进行UI渲染,以形成预测帧的送显图像。
作为示例,UI渲染包括:将预测帧的显示图像作为渲染目标,在该渲染目标之上执行S801中存储的UI渲染指令流,将UI信息绘制到预测帧的显示图像上,形成一帧可用于送显的完整视频帧。
策略2包括步骤S809和S810:
S809.利用图像复用或者图像混合获取预测帧的颜色图像。
其中,图像复用是指将两个真实帧中的任意一个真实帧作为预测帧的颜色图像输出。图像混合是指将两个真实帧的图像按照进行半透明混合,并将混合得到的图像作为预测帧的颜色图像输出。
应理解,策略2适用于两个真实帧之间的相机夹角较大的情形。若相机夹角大于或等于第三预设夹角,则表示当前游戏画面处于大幅晃动的状态,因此预测帧计算所需的信息缺失过大,算法生成的结果将不准确,因此可以采用图像复用或者图像混合的方式获取预测帧。
S810.进行UI渲染,以形成预测帧的送显图像。
作为示例,UI渲染包括:将预测帧的显示图像作为渲染目标,在该渲染目标之上执行S801中存储的UI渲染指令流,将UI信息绘制到预测帧的显示图像上,形成一帧可用于送显的完整视频帧。
策略3包括步骤S811:
S811.终止生成预测帧的过程。
应理解,若两个真实帧之间的相机夹角在第二夹角阈值和第三夹角阈值之间的范围之内,则表明当前场景下算法无法生成画质可接受的预测帧图像,因此退出帧预测方案,恢复游戏的正常渲染。
上面结合图1-图8,对本申请实施例提供的用于图像处理的方法进行了说明,下面对本申请实施例提供的执行上述方法的装置进行描述。
为了能够更好地理解本申请实施例,下面对本申请实施例的用于图像处理的装置的结构进行介绍。
图9示出了用于图像处理的装置900的硬件结构示意图。用于图像处理的装置900可以包括处理器110,外部存储器接口120,内部存储器121,传感器模块180,显示屏194。
可以理解的是,本申请实施例示意的结构并不构成对用于图像处理的装置900的具体限定。在本申请另一些实施例中,用于图像处理的装置900可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),图形处理器(graphics processingunit,GPU),图像信号处理器(image signal processor,ISP),视频编解码器等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,是示意性说明,并不构成对用于图像处理的装置900的结构限定。在本申请另一些实施例中,用于图像处理的装置900也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
装置900通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像、显示视频和接收滑动操作等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emittingdiode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrixorganic light emitting diod,AMOLED),柔性发光二极管(flex light-emittingdiode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot lightemitting diodes,QLED)等。在一些实施例中,装置900可以包括1个或N个显示屏194,N为大于1的正整数。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展用于图像处理的装置900的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储用于图像处理的装置900使用过程中所创建的数据(比如音频数据,电话本等)等。
传感器模块180可以包括压力传感器,触摸传感器等。
图10为本申请实施例提供的一种用于图像处理的装置1000的结构示意图,该装置1000可以包括电子设备,或者也可以包括电子设备内的芯片或芯片系统。装置1000包括执行图3至图8中的用于图像处理的方法的功能单元。
如图10所示,装置1000包括扭曲单元1010和填充单元1020。
扭曲单元1010用于根据第一真实帧和第二真实帧之间的运动矢量信息,对第一真实帧的颜色图像进行至少一轮扭曲处理,以得到预测帧的第一目标颜色图像,预测帧的第一目标颜色图像包括至少一轮扭曲处理导致的空白区域。其中,至少一轮扭曲处理中的每轮扭曲处理包括:生成覆盖第一真实帧的全屏网格层,全屏网格层中包括多个网格,每个网格中包括多个像素;根据运动矢量信息,对全屏网格层中的至少部分网格进行扭曲处理;根据扭曲后的全屏网格层以及第一真实帧的颜色图像,输出每轮扭曲处理对应的颜色图像。
填充单元1020用于对预测帧的第一目标颜色图像中的空白区域进行填充,得到预测帧的第二目标颜色图像,第二目标颜色图像为预测帧的显示图像。
图11是本申请实施例提供的另一种用于图像处理的装置1100的硬件结构示意图,如图11所示,该装置1100包括处理器1101,通信线路1104以及至少一个通信接口(图11中示例性的以通信接口1103为例进行说明)。
处理器1101可包括通用中央处理器(central processing unit,CPU)、微处理器、特定应用集成电路(application-specific integrated circuit,ASIC)、GPU或一个或多个用于控制本申请方案程序执行的集成电路。
通信线路1104可包括在上述组件之间传送信息的电路。
通信接口1103,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线局域网(wireless local area networks,WLAN)等。
可能的,该用于图像处理的装置还可以包括存储器1102。
存储器1102可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路1104与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器1102用于存储执行本申请方案的计算机执行指令,并由处理器1101来控制执行。处理器1101用于执行存储器1102中存储的计算机执行指令,从而实现本申请实施例所提供的用于视频裁剪的方法。
可能的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
如图11所示,在具体实现中,作为一种实施例,处理器1101可以包括一个或多个CPU,也可以包括一个或多个GPU。
本申请实施例提供一种电子设备,该电子设备包括:处理器和存储器;存储器存储计算机执行指令;处理器执行存储器存储的计算机执行指令,使得电子设备执行上述用于图像处理的方法。
本申请实施例提供一种芯片。芯片包括处理器,处理器用于调用存储器中的计算机程序,以执行上述实施例中的用于图像处理的方法。其实现原理和技术效果与上述相关实施 例类似,此处不再赘述。
本申请实施例还提供了一种计算机可读存储介质。计算机可读存储介质存储有计算机程序。计算机程序被处理器执行时实现上述方法。上述实施例中描述的方法可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。如果在软件中实现,则功能可以作为一个或多个指令或代码存储在计算机可读介质上或者在计算机可读介质上传输。计算机可读介质可以包括计算机存储介质和通信介质,还可以包括任何可以将计算机程序从一个地方传送到另一个地方的介质。存储介质可以是可由计算机访问的任何目标介质。
一种可能的实现方式中,计算机可读介质可以包括RAM,ROM,只读光盘(compact disc read-only memory,CD-ROM)或其它光盘存储器,磁盘存储器或其它磁存储设备,或目标于承载的任何其它介质或以指令或数据结构的形式存储所需的程序代码,并且可由计算机访问。而且,任何连接被适当地称为计算机可读介质。例如,如果使用同轴电缆,光纤电缆,双绞线,数字用户线(Digital Subscriber Line,DSL)或无线技术(如红外,无线电和微波)从网站,服务器或其它远程源传输软件,则同轴电缆,光纤电缆,双绞线,DSL或诸如红外,无线电和微波之类的无线技术包括在介质的定义中。如本文所使用的磁盘和光盘包括光盘,激光盘,光盘,数字通用光盘(Digital Versatile Disc,DVD),软盘和蓝光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光光学地再现数据。上述的组合也应包括在计算机可读介质的范围内。
本申请实施例提供一种计算机程序产品,计算机程序产品包括计算机程序,当计算机程序被运行时,使得计算机执行上述方法。
本申请实施例是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程设备的处理单元以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理单元执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
以上的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (18)

  1. 一种用于图像处理的方法,其特征在于,包括:
    根据第一真实帧和第二真实帧之间的运动矢量信息,对所述第一真实帧的颜色图像进行至少一轮扭曲处理,以得到预测帧的第一目标颜色图像,所述预测帧的第一目标颜色图像包括所述至少一轮扭曲处理导致的空白区域;
    其中,所述至少一轮扭曲处理中的每轮扭曲处理包括:生成覆盖所述第一真实帧的全屏网格层,所述全屏网格层中包括多个网格,每个网格中包括多个像素;根据所述运动矢量信息,对所述全屏网格层中的至少部分网格进行扭曲处理;根据扭曲后的所述全屏网格层以及所述第一真实帧的颜色图像,输出每轮扭曲处理对应的颜色图像;
    对所述预测帧的第一目标颜色图像中的空白区域进行填充,得到预测帧的第二目标颜色图像,所述第二目标颜色图像为所述预测帧的显示图像。
  2. 如权利要求1所述的方法,其特征在于,所述至少一轮扭曲处理包括多轮扭曲处理,所述多轮扭曲处理分别对应于多种扭曲种类,所述多种扭曲种类包括以下至少一项:
    对静态物体执行的扭曲操作;
    对动态物体执行的扭曲操作;
    对远景中的静态物体执行的扭曲操作;
    对近景中的动态物体执行的扭曲操作;
    对远景中的动态物体执行的扭曲操作;
    对近景中的静态物体执行的扭曲操作。
  3. 如权利要求1或2所述的方法,其特征在于,所述至少一轮扭曲处理包括多轮扭曲处理,所述多轮扭曲处理中的至少两轮扭曲处理对应的全屏网格层的分辨率不同。
  4. 如权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    在所述至少一轮扭曲处理中的第i轮处理的对象为远景中的静态物体的情况下,将所述第i轮处理中采用的全屏网格层的分辨率设置为第一分辨率,i为大于等于1的整数;
    在所述至少一轮扭曲处理中的第i轮处理的对象为近景中的静态物体或者动态物体的情况下,将所述第i轮处理中采用的全屏网格层的分辨率设置为第二分辨率,所述第一分辨率低于所述第二分辨率。
  5. 如权利要求1至4中任一项所述的方法,其特征在于,所述至少一轮扭曲处理包括多轮扭曲处理,所述多轮扭曲处理中的第一轮扭曲处理包括:
    生成覆盖所述第一真实帧的第一全屏网格层;
    根据静态物体的运动矢量信息,对所述第一全屏网格层中的网格顶点进行偏移,得到扭曲后的第一全屏网格层;
    根据所述扭曲后的第一全屏网格层,将所述第一真实帧的颜色图像中的静态物体的像素输出至空白图像中,以得到第一颜色图像。
  6. 如权利要求5所述的方法,其特征在于,所述将所述第一真实帧的颜色图像中的静态物体的像素输出至空白图像中,以得到第一颜色图像,包括:
    根据所述第一真实帧的全屏语义图像,判断第一像素是属于静态物体还是动态物体,所述第一像素为所述第一真实帧的颜色图像中的任一像素;
    在所述第一像素属于静态物体的情况下,将所述第一像素输出至所述空白图像中的第 一位置中,以得到所述第一颜色图像,所述第一位置是所述第一像素在所述扭曲后的第一全屏网格层中的对应位置;
    在所述第一像素属于动态物体的情况下,不向所述空白图像输出所述第一像素。
  7. 如权利要求5或6所述的方法,其特征在于,所述至少一轮扭曲处理中的第二轮扭曲处理包括:
    生成覆盖所述第一真实帧的第二全屏网格层;
    根据动态物体的运动矢量信息,对所述第二全屏网格中的网格顶点进行偏移,得到扭曲后的第二全屏网格层;
    根据所述扭曲后的第二全屏网格层,将所述第一真实帧的颜色图像中的动态物体的像素输出至所述第一颜色图像中,以得到所述第一目标颜色图像。
  8. 如权利要求7所述的方法,其特征在于,所述根据所述扭曲后的第二全屏网格层,将所述第一真实帧的颜色图像中的动态物体的像素输出至所述第一颜色图像中,以得到所述第一目标颜色图像,包括:
    根据所述第一真实帧的全屏语义图像,判断第二像素是属于静态物体还是动态物体,所述第二像素为所述第一真实帧的颜色图像中的任一像素;
    在所述第二像素属于动态物体的情况下,将所述第二像素输出至所述第一颜色图像中的第二位置中,以得到所述第一目标颜色图像,所述第二位置是所述第二像素在所述扭曲后的第二全屏网格层中的对应位置;
    在所述第二像素属于静态物体的情况下,不向所述第一颜色图像输出所述第二像素。
  9. 如权利要求1至4中任一项所述的方法,其特征在于,所述至少一轮扭曲处理包括多轮扭曲处理,所述多轮扭曲处理中的第一轮扭曲处理包括:
    生成覆盖所述第一真实帧的第一全屏网格层;
    根据远景中的静态物体的运动矢量信息,对所述第一全屏网格层中的网格顶点进行偏移,得到扭曲后的第一全屏网格层;
    根据所述扭曲后的第一全屏网格层,将所述第一真实帧的颜色图像中的远景中的静态物体的像素输出至空白图像中,以得到第一颜色图像。
  10. 如权利要求9所述的方法,其特征在于,所述根据所述扭曲后的第一全屏网格层,将所述第一真实帧的颜色图像中的远景中的静态物体的像素输出至空白图像中,以得到第一颜色图像,包括:
    根据所述第一真实帧的全屏语义图像,判断第一像素是属于静态物体还是动态物体,所述第一像素为所述第一真实帧的颜色图像中的任一像素;
    在所述第一像素属于静态物体的情况下,根据所述第一真实帧的场景深度图,确定所述第一像素的深度是否大于预设深度阈值;
    在所述第一像素的深度大于所述预设深度阈值D的情况下,将所述第一像素输出至所述空白图像中的第一位置中,以得到所述第一颜色图像,所述第一位置是所述第一像素在所述扭曲后的第一全屏网格层中的对应位置;
    在所述第一像素的深度小于所述预设深度阈值D的情况下,不向所述空白图像输出所述第一像素;
    在所述第一像素属于动态物体的情况下,不向所述空白图像输出所述第一像素。
  11. 如权利要求9或10所述的方法,其特征在于,所述多轮扭曲处理中的第二轮扭曲处理包括:
    生成覆盖所述第一真实帧的第二全屏网格层;
    根据近景中的静态物体的运动矢量信息,对所述第二全屏网格层中的网格顶点进行偏移,得到扭曲后的第二全屏网格层;
    根据所述扭曲后的第二全屏网格层,将所述第一真实帧的颜色图像中的近景中的静态物体的像素输出至所述第一颜色图像中,以得到第二颜色图像。
  12. 如权利要求11所述的方法,其特征在于,所述多轮扭曲处理中的第三轮扭曲处理包括:
    生成覆盖所述第一真实帧的第三全屏网格层;
    根据远景中的动态物体的运动矢量信息,对所述第三全屏网格层中的网格顶点进行偏移,得到扭曲后的第三全屏网格层;
    根据所述扭曲后的第三全屏网格层,将所述第一真实帧的颜色图像中的动态物体的像素输出至所述第二颜色图像中,以得到第三颜色图像。
  13. 如权利要求12所述的方法,其特征在于,所述多轮扭曲处理中的第四轮扭曲处理包括:
    生成覆盖所述第一真实帧的第四全屏网格层;
    根据近景中的动态物体的运动矢量信息,对所述第四全屏网格层中的网格顶点进行偏移,得到扭曲后的第四全屏网格层;
    根据所述扭曲后的第四全屏网格层,将所述第一真实帧的颜色图像中的动态物体的像素输出至所述第三颜色图像中,以得到所述第一目标颜色图像。
  14. 如权利要求1至13中任一项所述的方法,其特征在于,所述对所述预测帧的第一目标颜色图像中的空白区域进行填充,得到预测帧的第二目标颜色图像,包括:
    确定所述第一真实帧和第二真实帧之间的相机夹角所属的阈值范围;
    根据所述相机夹角所属的阈值范围,从至少两个候选像素填充算法中选择目标像素填充算法;
    根据所述目标像素填充算法,对所述第一目标颜色图像中的空白区域进行填充,以得到所述第二目标颜色图像。
  15. 如权利要求14所述的方法,其特征在于,所述根据所述相机夹角所属的阈值范围,从至少两个候选像素填充算法中选择目标像素填充算法,包括:
    在所述相机夹角小于第一夹角阈值的情况下,选择第一像素填充算法为所述目标像素填充算法,其中,所述第一像素填充算法通过采样待填充像素周围的静态物体的像素实现像素填充;
    在所述相机夹角大于所述第一夹角阈值的情况下,选择第二像素填充算法为所述目标像素填充算法,其中,所述第二像素填充算法通过采样待填充像素周围的静态物体以及动态物体的像素实现像素填充。
  16. 一种用于图像处理的装置,其特征在于,所述装置包括:
    处理器和存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,使得所述装置执行如权利要求1-15中任一项所述的方法。
  17. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-15中任一项所述的方法。
  18. 一种计算机程序产品,其特征在于,包括计算机程序,当所述计算机程序被运行时,使得计算机执行如权利要求1-15中任一项所述的方法。
PCT/CN2023/120794 2022-10-26 2023-09-22 用于图像处理的方法、装置及存储介质 WO2024087971A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211319597.7A CN117974814A (zh) 2022-10-26 2022-10-26 用于图像处理的方法、装置及存储介质
CN202211319597.7 2022-10-26

Publications (1)

Publication Number Publication Date
WO2024087971A1 true WO2024087971A1 (zh) 2024-05-02

Family

ID=90829983

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/120794 WO2024087971A1 (zh) 2022-10-26 2023-09-22 用于图像处理的方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN117974814A (zh)
WO (1) WO2024087971A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109672886A (zh) * 2019-01-11 2019-04-23 京东方科技集团股份有限公司 一种图像帧预测方法、装置及头显设备
US20200021824A1 (en) * 2016-09-21 2020-01-16 Kakadu R & D Pty Ltd Base anchored models and inference for the compression and upsampling of video and multiview imagery
WO2022108472A1 (en) * 2020-11-20 2022-05-27 Huawei Technologies Co., Ltd Device and method for optimizing power consumption during frames rendering
US20220295095A1 (en) * 2021-03-11 2022-09-15 Qualcomm Incorporated Learned b-frame coding using p-frame coding system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200021824A1 (en) * 2016-09-21 2020-01-16 Kakadu R & D Pty Ltd Base anchored models and inference for the compression and upsampling of video and multiview imagery
CN109672886A (zh) * 2019-01-11 2019-04-23 京东方科技集团股份有限公司 一种图像帧预测方法、装置及头显设备
WO2022108472A1 (en) * 2020-11-20 2022-05-27 Huawei Technologies Co., Ltd Device and method for optimizing power consumption during frames rendering
US20220295095A1 (en) * 2021-03-11 2022-09-15 Qualcomm Incorporated Learned b-frame coding using p-frame coding system

Also Published As

Publication number Publication date
CN117974814A (zh) 2024-05-03

Similar Documents

Publication Publication Date Title
CN106611435B (zh) 动画处理方法和装置
US11783522B2 (en) Animation rendering method and apparatus, computer-readable storage medium, and computer device
US20230053462A1 (en) Image rendering method and apparatus, device, medium, and computer program product
US8102428B2 (en) Content-aware video stabilization
US8913664B2 (en) Three-dimensional motion mapping for cloud gaming
US7911467B2 (en) Method and system for displaying animation with an embedded system graphics API
WO2022110903A1 (zh) 全景视频渲染方法及系统
US20200410740A1 (en) Graphics processing systems
US11468629B2 (en) Methods and apparatus for handling occlusions in split rendering
US10237563B2 (en) System and method for controlling video encoding using content information
WO2022218042A1 (zh) 视频处理方法、装置、视频播放器、电子设备及可读介质
CN114570020A (zh) 数据处理方法以及系统
CN112700519A (zh) 动画展示方法、装置、电子设备及计算机可读存储介质
WO2024087971A1 (zh) 用于图像处理的方法、装置及存储介质
KR20210135859A (ko) 체적 3d 비디오 데이터의 실시간 혼합 현실 서비스를 위해 증강 현실 원격 렌더링 방법
US20230245326A1 (en) Methods and systems for dual channel transfer of game
CN114222185B (zh) 视频播放方法、终端设备及存储介质
CN113469930A (zh) 图像处理方法、装置、及计算机设备
CN115705668A (zh) 一种视图绘制的方法、装置及存储介质
JP2002369076A (ja) 3次元特殊効果装置
WO2024045701A1 (zh) 数据处理方法、装置、设备及存储介质
Zhang et al. A Hybrid framework for mobile augmented reality
KR102561903B1 (ko) 클라우드 서버를 이용한 ai 기반의 xr 콘텐츠 서비스 방법
CN117472368A (zh) 视图调试方法、装置、计算机设备和存储介质
CN115996302A (zh) 一种数字拼接墙上条屏信号图像平滑方法、装置及设备