WO2020192608A1 - 图形渲染方法、装置和计算机可读存储介质 - Google Patents

图形渲染方法、装置和计算机可读存储介质 Download PDF

Info

Publication number
WO2020192608A1
WO2020192608A1 PCT/CN2020/080582 CN2020080582W WO2020192608A1 WO 2020192608 A1 WO2020192608 A1 WO 2020192608A1 CN 2020080582 W CN2020080582 W CN 2020080582W WO 2020192608 A1 WO2020192608 A1 WO 2020192608A1
Authority
WO
WIPO (PCT)
Prior art keywords
vertex data
cpu
processed
gpu
processing
Prior art date
Application number
PCT/CN2020/080582
Other languages
English (en)
French (fr)
Inventor
张璠
吴江铮
石鑫栋
王术
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20778929.8A priority Critical patent/EP3937118A4/en
Publication of WO2020192608A1 publication Critical patent/WO2020192608A1/zh
Priority to US17/484,523 priority patent/US11908039B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/52Parallel processing

Definitions

  • This application relates to the field of image processing technology, and more specifically, to a graphics rendering method, device, and computer-readable storage medium.
  • a graphics processing unit is a microprocessor dedicated to image operations, and is often used for graphics rendering.
  • the GPU When performing graphics rendering in traditional solutions, the GPU generally performs the entire process of image rendering. However, in some cases, for example, graphics rendering requires a large amount of calculations (such as the case of heavy graphics display) or GPUs need to process many other operations (such as GPUs are also involved in large-scale scientific calculations while rendering graphics) When using traditional solutions for graphics rendering, the load on the GPU will be too high, which will affect the performance of the GPU when rendering images.
  • the present application provides a graphics rendering method, device, and computer-readable storage medium to reduce the load when the GPU performs graphics rendering.
  • a graphics rendering method includes: a central processing unit (CPU) acquires to-be-processed vertex data, where the to-be-processed vertex data is vertex data for graphics rendering processing by a graphics processor GPU; Perform processing to obtain vertex data within the user's perspective; the CPU sends the vertex data within the user's perspective to the graphics processor GPU for rendering processing.
  • CPU central processing unit
  • the aforementioned vertex data to be processed may be all the vertex data or part of the vertex data required to draw a graph once.
  • the to-be-processed vertex data includes not only vertex data within the user's perspective, but also vertex data outside the user's perspective.
  • the CPU processes the vertex data to be processed to obtain the vertex data within the user's perspective, which is equivalent to removing the vertex data outside the user's perspective from the to-be-processed vertex data, so as to obtain the vertex data within the user's perspective.
  • the vertex data in the user's perspective may be the vertex position information of the object image visible in the user's perspective.
  • the vertex data to be processed obtained by the CPU may be vertex data located in the local coordinate system.
  • the aforementioned vertex data to be processed is vertex data captured by a draw call command used to render a frame of image.
  • the draw call instruction refers to a graphics program interface instruction.
  • the number of draw call instructions is the same as the number of graphics drawing times of a cross-platform graphics program interface.
  • the draw call instructions specifically include commands such as glDrawArrays and glDrawElements.
  • the draw call instruction can be used to achieve flexible capture of the vertex data, and the CPU can be used to achieve flexible vertex data. deal with.
  • the to-be-processed vertex data acquired by the foregoing CPU is part or all of the vertex data stored in the storage module.
  • the storage module currently stores vertex data corresponding to multiple draw calls, so the CPU may obtain the vertex data corresponding to one draw call from the storage module as the vertex data to be processed when obtaining the vertex data.
  • the aforementioned storage module may specifically be a double-rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM) or video memory.
  • the storage module when the foregoing graphics rendering method is executed by a terminal device, the storage module may be a DDR SDRAM located inside the terminal device, and when the foregoing graphics rendering method is executed by a computer device, the storage module may be a video memory located inside the computer device.
  • the CPU acquiring the vertex data to be processed includes: the CPU acquiring the vertex data to be processed from a storage module.
  • the CPU can obtain the to-be-processed vertex data from the cache module, so that it can process the to-be-processed vertex data originally processed by the GPU, which can reduce the burden on the GPU.
  • the CPU sends the vertex data within the user's perspective to the GPU for rendering processing, including: the CPU stores the vertex data within the user's perspective to storage Module, so that the GPU obtains the vertex data within the user's perspective from the storage module and performs image rendering processing.
  • the CPU stores the processed vertex data in the user's view range into the memory, so that the GPU can obtain the vertex data in the user's view range from the cache module, and then complete the subsequent graphics rendering processing.
  • the CPU Before processing the vertex data to be processed, the CPU can copy the vertex data to be processed from the memory, and after the vertex data to be processed is processed, replace the vertex data to be processed stored in the memory with the vertex data within the user's perspective.
  • acquiring the vertex data to be processed by the CPU includes: before the vertex data to be processed is processed by the GPU, the CPU intercepts the vertex data to be processed; Sending the vertex data of to the GPU for graphics rendering processing includes: the CPU replaces the to-be-processed vertex data with vertex data within the user's perspective.
  • the CPU intercepts the to-be-processed vertex data originally processed by the GPU, and transfers part of the processing of the to-be-processed vertex data to the CPU for execution, which can reduce the burden on the GPU for graphics rendering, thereby improving the efficiency of graphics rendering .
  • the above method further includes: the CPU according to at least one of the vertex data to be processed, the load of the CPU, and the load of the GPU One determines whether to process vertex data.
  • the CPU may determine to process the vertex data to be processed according to at least one of the number of vertex data to be processed, the size of the load of the CPU, and the size of the load of the GPU.
  • the number of vertex data to be processed may refer to the number (quantity) of vertex data to be processed, and the number of vertex data to be processed may also refer to the number of vertices corresponding to the vertex data.
  • the CPU determines to process the vertex data to be processed according to at least one of the vertex data to be processed, the load of the CPU, and the load of the GPU, including:
  • the CPU determines to process the vertex data to be processed:
  • the number of vertex data to be processed is greater than or equal to the first number threshold
  • the current load of the CPU is less than the first load threshold
  • the current load of the GPU is greater than or equal to the second load threshold.
  • the CPU determines whether to process the vertex data according to at least one of the vertex data to be processed, the load of the CPU, and the load of the GPU, including:
  • the CPU determines not to process the vertex data to be processed
  • the number of vertex data to be processed is less than the first number threshold
  • the current load of the CPU is greater than or equal to the first load threshold
  • the current load of the GPU is less than the second load threshold.
  • the current load of the foregoing CPU is the current total load of the CPU.
  • the current total load of the CPU may be the sum of the load of each core in the current CPU.
  • the current load of the CPU is the current load of the CPU core.
  • the current load of the CPU core may be the average value of the current load of the cores in the CPU, or the current load of any core in the CPU.
  • the current load of the CPU is the current load of a certain core in the CPU
  • the current load of the CPU being less than the first load threshold may mean that the current load of each core in the CPU is less than the first load threshold.
  • the vertex data to be processed is vertex data located in the local coordinate system
  • the CPU processes the vertex data to be processed to obtain vertex data within the user's perspective, including: CPU
  • coordinate transformation is performed on the vertex data located in the local coordinate system to obtain the vertex data located in the clipping coordinate system, wherein the auxiliary data includes a transformation matrix for performing coordinate transformation on the vertex data located in the local coordinate system;
  • the vertex data of the coordinate system is trimmed and eliminated to obtain the vertex data within the user's perspective.
  • the CPU may also obtain the auxiliary data first.
  • the vertex data can also be handed over to the GPU for processing.
  • the auxiliary data includes an MVP matrix
  • the CPU performs coordinate transformation on the vertex data located in the local coordinate system according to the auxiliary data to obtain the vertex data located in the clipping coordinate system, including :
  • the CPU performs coordinate transformation on the vertex data in the local coordinate system according to the MVP matrix to obtain the vertex data in the clipping coordinate system, where the MVP matrix is the product of the model matrix, the view matrix, and the projection matrix.
  • the foregoing MVP matrix is obtained by the CPU before performing coordinate transformation on the vertex data located in the local coordinate system.
  • the CPU can transform the vertex data from the local coordinate system to the clipping coordinate system through one coordinate transformation according to the MVP matrix, which can improve the efficiency of coordinate transformation.
  • the CPU includes M cores, and the CPU processes the vertex data to be processed, including: when the number of vertex data to be processed is less than the second number threshold, the CPU waits for processing Vertex data is allocated to a single core in the CPU for processing; when the number of vertex data to be processed is greater than or equal to the second number threshold, the CPU allocates the vertex data to be processed to N cores in the CPU for processing.
  • the foregoing second number threshold is greater than the first number threshold, M and N are both integers greater than 1, and N is less than or equal to M.
  • a thread may be individually activated to designate certain cores through an interface to perform the processing of the vertex data to be processed.
  • the vertex data to be processed can be reasonably distributed to a single core or multiple cores in the CPU for processing, so that the load of each core in the CPU is as relatively balanced as possible status.
  • the CPU allocates the vertex data to be processed to N cores in the CPU for processing, including: the CPU evenly allocates the vertex data to be processed to the N cores in the CPU Processing in the nucleus.
  • the current average load of N cores is less than the current average load of NM cores, where NM cores are other than N cores in the CPU nuclear.
  • the foregoing CPU distributing the to-be-processed vertex data to a single core in the CPU for processing includes: the CPU distributing the to-be-processed vertex data to the core with the least current core load in the CPU for processing.
  • a graphics rendering device which includes modules corresponding to the methods/operations/steps/actions described in the first aspect.
  • the foregoing device may be an electronic device, or a device for performing graphics rendering in an electronic device (for example, a chip, or a device that can be used with the electronic device).
  • the modules included in the above-mentioned graphics rendering apparatus may be hardware circuits, software, or hardware circuits combined with software.
  • a graphics rendering device in a third aspect, includes a processor, and the processor is configured to call a program code stored in a memory to perform part or all of the operations in any one of the above-mentioned first aspects.
  • the memory storing the program code may be located inside the graphics rendering device (the graphics rendering device may also include a memory in addition to the processor), or may be located outside the graphics rendering device (may be the memory of other devices) .
  • the aforementioned memory is a non-volatile memory.
  • the graphics rendering apparatus includes a processor and a memory
  • the processor and the memory may be coupled together.
  • a graphics rendering device in a fourth aspect, includes a central processing unit (CPU), an input/output interface, and a memory.
  • CPU central processing unit
  • input/output interface input/output interface
  • the CPU can obtain the to-be-processed vertex data through the input and output interface. After obtaining the to-be-processed vertex data, the CPU processes the to-be-processed vertex data to obtain the vertex data within the user's perspective, and calculate the data within the user's perspective. The vertex data is sent to the GPU for graphics rendering processing.
  • the aforementioned vertex data to be processed is vertex data for the GPU to perform graphics rendering processing.
  • the upper GPU may be located inside the graphics rendering device or in other devices other than the graphics rendering device.
  • a computer-readable storage medium stores program code, where the program code includes instructions for executing part or all of the operations in the method described in the first aspect.
  • the aforementioned computer-readable storage medium is located in an electronic device, and the electronic device may be a device capable of performing graphics rendering.
  • embodiments of the present application provide a computer program product, which when the computer program product runs on a communication device, causes the communication device to perform some or all of the operations in the method described in the first aspect.
  • a chip in a seventh aspect, includes a processor configured to perform part or all of the operations in the method described in the first aspect.
  • FIG. 1 is a schematic flowchart of a graphics rendering method according to an embodiment of the present application
  • Figure 2 is a schematic view of a frustum
  • Figure 3 is a schematic diagram of the perspective effect
  • Figure 4 is a schematic diagram of a cropped body
  • Figure 5 is a schematic diagram of cropping
  • Fig. 6 is a schematic diagram of two triangles in a complete sequence of clockwise and counterclockwise;
  • FIG. 7 is a schematic flowchart of graphics rendering in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of processing vertex data to obtain vertex data within the user's perspective
  • FIG. 9 is a schematic diagram of a process of performing coordinate transformation on vertex data
  • FIG. 10 is a schematic diagram of a process of performing coordinate transformation on vertex data
  • FIG. 11 is a schematic flowchart of a graphics rendering method according to an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a graphics rendering method according to an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a process in which the CPU allocates vertex data to cores in the CPU for processing
  • FIG. 14 is a schematic flowchart of a graphics rendering method according to an embodiment of the present application.
  • FIG. 15 is a schematic flowchart of a graphics rendering method according to an embodiment of the present application.
  • FIG. 16 is a schematic block diagram of a graphics rendering device according to an embodiment of the present application.
  • FIG. 17 is a schematic block diagram of an electronic device according to an embodiment of the present application.
  • the graphics rendering method in the embodiment of the present application may be executed by an electronic device.
  • the electronic device may be a mobile terminal (for example, a smart phone), a computer, a personal digital assistant, a wearable device, a vehicle-mounted device, an Internet of Things device or other devices capable of image rendering processing.
  • the electronic device may be a device running Android system, IOS system, windows system and other systems.
  • the various thresholds mentioned in this application can be set based on experience, or can be based on processing during graphics rendering
  • the size of the data volume is determined comprehensively.
  • the graphics rendering method of the embodiment of the present application may be executed by an electronic device.
  • the specific structure of the electronic device may be as shown in FIG. 1.
  • the specific structure of the electronic device will be described in detail below in conjunction with FIG.
  • the electronic device 1000 may include: a central processing unit (CPU) 1001, a graphics processing unit (GPU) 1002, a display device 1003, and a memory 1004.
  • the electronic device 10 may further include at least one communication bus 110 (not shown in FIG. 1) for implementing connection and communication between various components.
  • each component in the electronic device 1000 may also be coupled through other connectors, and other connectors may include various interfaces, transmission lines, or buses.
  • Each component in the electronic device 1000 may also be in a radioactive connection mode centered on the processor 1001.
  • coupling refers to being electrically connected or connected to each other, including direct connection or indirect connection through other devices.
  • the central processing unit 1001 and the graphics processing unit 1002 in the electronic device 1000 may be located on the same chip, or may be separate chips.
  • the functions of the central processing unit 1001, the graphics processing unit 1002, the display device 1003, and the memory 1004 are briefly introduced below.
  • the Central processing unit 1001 used to run operating system 1005 and application programs 1007.
  • the application program 1007 may be a graphics application program, such as a game, a video player, and so on.
  • the operating system 1005 provides a system graphics library interface.
  • the application program 1007 uses the system graphics library interface and the drivers provided by the operating system 1005, such as graphics library user mode drivers and/or graphics library kernel mode drivers, to generate graphics or The instruction stream of the image frame and the related rendering data required.
  • the system graphics library includes but is not limited to: embedded open graphics library (open graphics library for embedded system, OpenGL ES), the khronos platform graphics interface (the khronos platform graphics interface) or Vulkan (a cross-platform graphics application) Program interface) and other system graphics libraries.
  • the instruction stream contains a series of instructions, which are usually calls to the system graphics library interface.
  • the central processing unit 1001 may include at least one of the following types of processors: an application processor, one or more microprocessors, a digital signal processor (DSP), and a microcontroller (microcontroller unit, MCU) or artificial intelligence processor, etc.
  • processors an application processor, one or more microprocessors, a digital signal processor (DSP), and a microcontroller (microcontroller unit, MCU) or artificial intelligence processor, etc.
  • DSP digital signal processor
  • MCU microcontroller unit
  • artificial intelligence processor etc.
  • the central processing unit 1001 may further include necessary hardware accelerators, such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or an integrated circuit for implementing logic operations.
  • the processor 1001 may be coupled to one or more data buses for transmitting data and instructions between various components of the electronic device 10.
  • the graphics processor 1002 used to receive the graphics instruction stream sent by the processor 1001, generate a rendering target through a rendering pipeline, and display the rendering target on the display device 1003 through the layer composition display module of the operating system.
  • the graphics processor 1002 may include a general graphics processor that executes software, such as a GPU or other types of dedicated graphics processing units.
  • Display device 1003 used to display various images generated by the electronic device 10.
  • the image may be a graphical user interface (GUI) of the operating system or image data (including still images and video) processed by the graphics processor 1002 data).
  • GUI graphical user interface
  • the display device 1003 may include any suitable type of display screen.
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • the memory 1004 is a transmission channel between the central processing unit 1001 and the graphics processing unit 1002, and can be a double data rate synchronous dynamic random access memory (DDR SDRAM) or other types of caches.
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • the rendering pipeline is a sequence of operations performed by the graphics processor 1002 in the process of rendering graphics or image frames. Typical operations include: Vertex Processing, Primitive Processing, Rasterization, Fragment Fragment Processing and so on.
  • Local Space Local Space (Local Space, or Object Space);
  • transformation matrices In order to transform (vertex data) coordinates from one coordinate system to another coordinate system, several transformation matrices are generally needed. The most important transformation matrices are Model, View, and Projection. ) Three matrices.
  • the coordinates of the vertex data generally start from the local space (Local Space). Here, the coordinates of the local space are called the local coordinates (Local Coordinate). After the local coordinates are transformed, they will change to the world coordinates (World Coordinate) in turn. Observe the coordinates (View Coordinate), Clip Coordinate, and finally end in the form of Screen Coordinate.
  • the local coordinates are the coordinates of the object relative to the local origin and also the coordinates of the object's origin.
  • transform the local coordinates into world space coordinates The world space coordinates are in a larger space. These coordinates are relative to the global origin of the world, and they will be placed relative to the origin of the world together with other objects.
  • the world coordinates are transformed into observation space coordinates, so that each coordinate is observed from the angle of the camera or observer.
  • the cropping coordinates will be processed to the range of -1.0 to 1.0, and determine which vertices will appear on the screen.
  • the crop coordinates are transformed into screen coordinates, and then a process called Viewport Transform will be used.
  • the viewport transform transforms the coordinates in the range of -1.0 to 1.0 to the coordinate range defined by the glViewport function.
  • the final transformed coordinates will be sent to the rasterizer and converted into fragments (after converting into fragments, video images can be displayed according to the fragments).
  • the reason why the vertices are transformed into different spaces is because some operations are meaningful and more convenient in a specific coordinate system. For example, when you need to modify an object, it makes more sense to operate in local space; if you want to perform an operation on an object relative to the position of other objects, it’s more reasonable to do it in the world coordinate system. It makes sense, wait. If we want, we can also define a transformation matrix that transforms directly from the local space to the clipping space, but that will lose a lot of flexibility.
  • the local space refers to the coordinate space where the object is located, that is, the place where the object is initially located.
  • you create a cube in a modeling software such as Blender.
  • the origin of the cube you created may be at (0,0,0), even though it may end up in a completely different position in the program. It is even possible to create all models with (0,0,0) as their initial position (however they will end up in different positions in the world). Therefore, all the vertices of the created model are in local space: they are all local to your object.
  • the model matrix is a transformation matrix, which can place an object where it should be or face it by moving, scaling, and rotating it. For example, if you want to transform a house, you need to shrink it first (it is too large in local space), move it to a small town in the suburbs, and then rotate it to the left on the y-axis to match the nearby houses .
  • the observation space is often referred to as a cross-platform graphics program interface (open graphics library, OPENGL) camera (sometimes also called a camera space (Camera Space) or a visual space (Eye Space)).
  • OPENGL open graphics library
  • the observation space is the result of transforming world space coordinates into coordinates in front of the user's field of view. Therefore, the observation space is the space observed from the perspective of the camera. This is usually done by a series of combinations of displacement and rotation, panning/rotating the scene so that a specific object is transformed to the front of the camera. These combined transformations are usually stored in a View Matrix, which is used to transform the world coordinates into the observation space.
  • OPENGL At the end of a vertex shader run, OPENGL expects all coordinates to fall within a specific range, and any point outside this range should be clipped. The cropped coordinates will be ignored, so the remaining coordinates will become visible fragments on the screen. This is the origin of the name Clip Space.
  • projection Matrix In order to transform vertex coordinates from observation to clipping space, we need to define a projection matrix (Projection Matrix), which specifies a range of coordinates, such as -1000 to 1000 in each dimension. The projection matrix will then transform the coordinates within this specified range into the range of standardized device coordinates (-1.0, 1.0). All coordinates outside the range will not be mapped to the range between -1.0 and 1.0, so they will be clipped. Within the range specified by the projection matrix above, the coordinates (1250, 500, 750) will be invisible. This is because its x coordinate is out of range. It is converted to a standardized device coordinate greater than 1.0, so it is clipped. Up.
  • OpenGL will reconstruct the triangle into one or more triangles to fit the clipping range.
  • the coordinate transformation process may involve orthographic projection and perspective projection.
  • the two projection methods are described in detail below.
  • the orthographic projection matrix defines a cube-like frustum box, which defines a clipping space, and vertices outside this space will be clipped. Creating an orthographic projection matrix requires specifying the width, height, and length of the visible frustum. After using the orthographic projection matrix to transform into the clipping space, all the coordinates in the frustum will not be clipped. Its frustum looks like a container:
  • the frustum defines the visible coordinates, which are specified by the width, height, near (Near) plane, and far (Far) plane. Any coordinates that appear before the near plane or behind the far plane will be clipped.
  • the orthographic frustum directly maps all the coordinates inside the frustum to the standardized device coordinates, because the w component of each vector is not changed; if the w component is equal to 1.0, the perspective division will not change this coordinate.
  • the first two parameters specify the left and right coordinates of the frustum
  • the third and fourth parameters specify the bottom and top of the frustum.
  • we define the size of the near plane and the far plane we define the size of the near plane and the far plane, and then the fifth and sixth parameters define the distance between the near plane and the far plane.
  • This projection matrix transforms the coordinates within these x, y, and z values into standardized device coordinates.
  • the orthographic projection matrix directly maps the coordinates to the 2D plane, that is, your screen, but in fact a direct projection matrix will produce unrealistic results, because this projection does not take Perspective into consideration. So we need perspective projection matrix to solve this problem.
  • each component of the vertex coordinate will be divided by its w component, the farther from the observer the vertex coordinate will be smaller. This is another reason why the w component is very important. It can help us perform perspective projection.
  • the final result coordinate is in the standardized equipment space.
  • a perspective projection matrix can be created in GLM like this:
  • glm::mat4proj glm::perspective(glm::radians(45.0f),(float)width/(float)height,0.1f,100.0f);
  • the first parameter of the above perspective projection matrix defines the value of fov, which represents the field of view (Field of View), and sets the size of the observation space. If you want a real observation effect, its value is usually set to 45.0f, but if you want a doomsday style result, you can set it to a larger value.
  • the second parameter sets the aspect ratio, which is the width of the viewport divided by the height.
  • the third and fourth parameters set the near and far planes of the frustum. We usually set the short distance to 0.1f and the long distance to 100.0f. All vertices in the near and far planes and inside the frustum will be rendered.
  • a perspective frustum can be viewed as an unevenly shaped box, and each coordinate inside the box will be mapped to a point in the clipping space.
  • the vertex data can be subjected to the following coordinate transformations: local coordinate system -> world coordinate system -> observer coordinate system -> cropping coordinate system, and then the cropping operation can be performed.
  • the CPU side can perform a simplified cropping operation :
  • the vertex coordinates defined in the clipping space (x, y, z, w) are clipped according to the frustum (clipping volume).
  • the clipping volume is defined by 6 clipping planes, which are called near, far, left, right, upper, and lower clipping planes.
  • the range of the clipping body's coordinates is as follows:
  • the trimmed vertex data can be updated to the vertex data. Specifically, after the cropping is completed, the corresponding vertex data and index data can be updated, and these data can be used as the data input of the draw call instruction and sent to the rendering pipeline of the GPU.
  • a conservative clipping method for example, a simplified Cohen-Sutherland algorithm
  • the AE line segment will not be clipped or truncated to generate new vertices.
  • the culling operation mainly discards the triangle that is facing away from the viewer. To determine whether the triangle is front or back, you first need to know its direction.
  • the direction of the triangle specifies the bending direction or path sequence starting from the first vertex, passing through the second and third vertices, and finally back to the first vertex.
  • Figure 6 shows two examples of triangles with a clockwise and counterclockwise bending sequence.
  • the counterclockwise triangle is the triangle facing the observer
  • the clockwise triangle is the triangle facing away from the observer (which direction the triangle is facing the observer)
  • the triangles of this group can be flexibly set according to the program instructions (for example, open gl instructions) in advance), then the clockwise triangles can be retained during the culling operation, and the counterclockwise triangles can be removed.
  • FIG. 7 is a schematic flowchart of a graphics rendering method according to an embodiment of the present application.
  • the method shown in FIG. 7 may be executed by an electronic device.
  • the method shown in FIG. 7 includes steps 101 to 103, and these steps are respectively described in detail below.
  • a central processing unit obtains vertex data to be processed.
  • the aforementioned CPU may be located inside an electronic device.
  • the aforementioned vertex data to be processed may be vertex data obtained by the CPU from a cache module (cache unit) (of an electronic device), and the vertex data to be processed is vertex data for the GPU to perform graphics rendering processing.
  • the aforementioned vertex data to be processed may be all the vertex data or part of the vertex data required to draw a graph once.
  • the to-be-processed vertex data includes not only vertex data within the user's perspective, but also vertex data outside the user's perspective.
  • the CPU processes the vertex data to be processed to obtain the vertex data within the user's perspective, which is equivalent to removing the vertex data outside the user's perspective from the to-be-processed vertex data, so as to obtain the vertex data within the user's perspective.
  • the vertex data in the user's perspective may be the vertex position information of the object image visible in the user's perspective.
  • acquiring the vertex data to be processed by the CPU includes: the CPU acquiring the vertex data to be processed from a storage module.
  • the foregoing storage module caches the vertex data to be processed for the GPU to perform graphics rendering processing.
  • the storage module may be a DDR SDRAM located inside the terminal device.
  • the storage module may be a video memory located inside the computer device.
  • the CPU can also obtain the vertex data to be processed from the storage module, so that the CPU can also implement the processing of the vertex data to be processed, thereby reducing the burden on the GPU.
  • the to-be-processed vertex data obtained by the CPU may be vertex data obtained by a draw call, where the vertex data obtained by the draw call may refer to the vertex data required for drawing a graph once.
  • the draw call instruction refers to a graphics program interface instruction.
  • the number of draw call instructions is the same as the number of graphics drawing times of a cross-platform graphics program interface.
  • the draw call instructions specifically include commands such as glDrawArrays and glDrawElements.
  • the draw call instruction can be used to achieve flexible capture of the vertex data, and the CPU can be used to achieve flexible vertex data. deal with.
  • the CPU processes the aforementioned vertex data to be processed to obtain vertex data within the user's perspective.
  • the vertex data to be processed obtained in the foregoing step 101 may be vertex data located in a local coordinate system (also referred to as a local coordinate system, with an English name of local space). Therefore, in step 102, the CPU actually processes the vertex data located in the local coordinate system to obtain the vertex data within the user's perspective.
  • a local coordinate system also referred to as a local coordinate system, with an English name of local space.
  • the local coordinates are the coordinates of the rendering object relative to the origin of the object, and are also the coordinates of the object's origin. When you need to modify an object, it makes more sense to operate in local space.
  • processing the vertex data to be processed in step 102 to obtain vertex data within the user's perspective includes: the CPU performs coordinate transformation on the vertex data located in the local coordinate system according to the auxiliary data to obtain the cropping coordinate system The vertex data of the CPU; the CPU cuts and removes the vertex data located in the clipping coordinate system to obtain the vertex data within the user's perspective.
  • auxiliary data includes a transformation matrix for performing coordinate transformation on vertex data located in a local coordinate system.
  • the clipping coordinates of the vertex data of the local coordinate system in the clipping coordinate system can be transformed into the range of [-0.1,0.1], which is convenient for subsequent judgments of which vertices will be Will appear on the screen.
  • cropping you can delete primitives outside the cropped volume.
  • the culling operation is mainly to discard the primitives facing the viewer.
  • the direction of the triangle starts from the first vertex, passes through the second and third vertices, and finally returns to the bending direction or path sequence of the first vertex.
  • a triangular primitive with a clockwise bending direction may be a primitive facing the viewer
  • a triangular primitive with a counterclockwise bending direction may be a primitive facing the viewer.
  • Such primitives need to be removed.
  • the vertex data within the user's perspective can be obtained through steps 201 to 203.
  • the processing procedure shown in FIG. 8 is briefly introduced below.
  • the vertex data to be processed just acquired by the CPU is data located in the local coordinate system, and the vertex data to be processed needs to be transformed from the local coordinate system to the clipping coordinate system to perform subsequent clipping and culling.
  • one coordinate transformation can be used to transform the vertex data from the local coordinate system to the clipping coordinate system, or multiple coordinate transformations can be used to transform the vertex data from the local coordinate system to the clipping coordinate system.
  • the first processing method according to the auxiliary data, the vertex data located in the local coordinate system are sequentially transformed to obtain the vertex data located in the clipping coordinate system.
  • the specific process of the CPU performing coordinate transformation on the vertices in the local coordinate system includes:
  • the auxiliary data can include a model matrix, a view matrix (or can also be called an observer matrix), and a projection matrix.
  • These matrices are matrices that match the vertex data obtained by the CPU.
  • the acquired vertex data located in the local coordinate system is transformed into the clipping coordinate system.
  • the coordinates of the vertex data in the world coordinate system can also be called world space coordinates
  • the world space coordinates of the vertex data are the spatial coordinates of the vertex data relative to the world origin, which are coordinates in a larger spatial range.
  • the vertex data in the world coordinate system will be placed relative to the origin of the world together with other objects. If you want to make an operation on the vertex data relative to the position of other objects, it makes more sense to do it in the world coordinate system.
  • the coordinates of the vertex data in the observer coordinate system can be referred to as observer space coordinates, and the observer space coordinates are the coordinates obtained by observing from the angle of the camera or the observer.
  • the vertex data within the user's perspective can be obtained through steps 301 to 303.
  • the processing procedure shown in FIG. 9 is briefly introduced below.
  • step 301 the vertex data in the local coordinate system can be multiplied by the model matrix to obtain the vertex data in the world coordinate system.
  • step 302 the vertex data in the world coordinate system may be multiplied by the view matrix to obtain the vertex data in the observer coordinate system.
  • step 303 the vertex data in the observer coordinate system may be multiplied by the projection matrix to obtain the vertex data in the clipping coordinate system.
  • the second processing method transform the vertex data in the local coordinate system once according to the auxiliary data to obtain the vertex data in the clipping coordinate system.
  • MVP model, view and projection matrix
  • the specific process of the CPU performing coordinate transformation on the vertex data in the local coordinate system includes:
  • the auxiliary data may include MVP, where MVP is a matrix obtained by multiplying a model matrix, a view matrix, and a projection matrix in sequence.
  • the model matrix, view matrix, and projection matrix of the MVP obtained are matrices that match the vertex data obtained by the CPU.
  • the vertex data can be directly transformed from the local coordinate system to the clipping coordinate system through step 401.
  • the processing procedure shown in Fig. 10 is briefly introduced below.
  • the vertex data when the second processing method is used for coordinate transformation, can be transformed from the local coordinate system to the clipping coordinate system through one coordinate transformation according to the MVP matrix, which can improve the efficiency of coordinate transformation.
  • the MVP matrix can be obtained in advance by the CPU. Specifically, the MVP matrix may be obtained by the CPU before performing coordinate transformation on the vertex data. In this way, when performing coordinate transformation based on the MVP matrix, the time required for coordinate transformation can be saved.
  • vertex processing may also be performed on the vertex data located in the local coordinate system.
  • Performing vertex processing on the vertex data located in the local coordinate system may specifically include: combining the vertex data into primitives according to the specified primitive type and index data (indices data) of the vertex data to obtain primitive data.
  • the vertex processing is completed, and then the primitive processing can be continued on the primitive data to obtain the vertex data within the user's perspective.
  • the vertex shader can also be used to perform illumination transformation on the position of the vertices in the primitive data, and then perform primitive processing on the processed primitive data.
  • primitive data which may be regarded as vertex data combined according to a certain shape. Therefore, the primitive data can also be regarded as vertex data in essence.
  • primitive processing is performed on the primitive data, it is essentially the processing of the vertex data after the vertex processing.
  • auxiliary data is required to be used when performing vertex processing on vertex data and performing subsequent coordinate transformations.
  • the auxiliary data used when performing vertex processing on vertex data is index data of the vertex data.
  • the auxiliary data used in coordinate transformation can be called coordinate transformation matrix (uniform data).
  • the CPU sends the vertex data within the user's perspective to the GPU for rendering processing.
  • Each of the aforementioned CPU and GPU may include multiple cores.
  • the foregoing CPU and GPU may be located in the same electronic device, or may be located in different electronic devices.
  • the CPU and GPU are both located in the same electronic device, and graphics can be rendered through the cooperation of the CPU and GPU.
  • the CPU is located in a client device (for example, a terminal device), and the GPU is located in a cloud device (such as a server in the cloud).
  • a client device for example, a terminal device
  • the GPU is located in a cloud device (such as a server in the cloud).
  • the graphics can be rendered.
  • the CPU in the client device can first obtain the vertex data and process the vertex data, and then send the finally obtained vertex data within the user's perspective to the GPU for rendering processing.
  • the client device can Obtain the rendered graphics from the cloud device for display.
  • the CPU sends the vertex data in the user's view range to the GPU for rendering processing, including: the CPU stores the vertex data in the user's view range to the storage module, so that the GPU obtains the user's view range from the storage module Vertex data within and perform image rendering processing.
  • the CPU stores the processed vertex data in the user's view range into the memory, so that the GPU can obtain the vertex data in the user's view range from the cache module, and then complete the subsequent graphics rendering processing.
  • the CPU Before processing the vertex data to be processed, the CPU can copy the vertex data to be processed from the memory, and after processing the vertex data, replace the vertex data to be processed stored in the memory with the vertex data within the user's perspective.
  • step 102a the method shown in FIG. 7 further includes step 102a, and step 102a is described below.
  • the CPU determines whether to process the vertex data to be processed.
  • step 102a the CPU determines whether the CPU needs to process the vertex data to be processed.
  • the CPU may determine whether to process the vertex data according to at least one of the vertex data to be processed, the load of the CPU, and the load of the GPU.
  • step 102a When the CPU determines in step 102a to process the vertex data to be processed, steps 102 and 103 are continued, and when the CPU determines in step 102a that the vertex data to be processed is not processed, the GPU may continue to process the vertex data to be processed.
  • the CPU may determine whether to process the vertex data to be processed according to at least one of the number of vertex data to be processed, the load of the CPU, and the load of the GPU.
  • the number of vertex data to be processed may refer to the number (quantity) of vertex data to be processed, and the number of vertex data to be processed may also refer to the number of vertices corresponding to the vertex data.
  • the CPU determining whether to process the vertex data to be processed according to at least one of the vertex data to be processed, the load of the CPU, and the load of the GPU includes:
  • the CPU determines to process the vertex data to be processed
  • Case A The number of vertex data to be processed is greater than or equal to the first number threshold
  • Case B The current load of the CPU is less than the first load threshold
  • Case C The current load of the GPU is greater than or equal to the second load threshold.
  • Case A The number of vertex data to be processed is greater than or equal to the first number threshold
  • the number of vertex data to be processed is large, if the vertex data to be processed is directly handed over to the GPU for processing, it may bring a greater burden to the GPU (the larger the number of vertex data to be processed, the larger the corresponding calculation amount is generally ), therefore, when the number of vertex data to be processed is large, by transferring the processing of the vertex data to be processed to the CPU for processing, the burden on the GPU can be greatly reduced, and the effect of reducing the burden (or load) on the GPU is more obvious.
  • the number of vertex data to be processed is greater than or equal to the first number threshold, it can be considered that the number of vertex data to be processed is (relatively) large. At this time, in order to reduce the burden on the GPU, the vertex data to be processed can be handed over to GPU processing.
  • the number of vertex data to be processed is less than the first number threshold, it can be considered that the number of vertex data to be processed is (relatively) small.
  • the vertex data to be processed is directly handed over to the GPU for processing, which generally does not bring to the GPU.
  • the GPU can perform the processing of the vertex data to be processed.
  • Case B The current load of the CPU is less than the first load threshold
  • the vertex data to be processed can be handed over to the CPU for processing; in addition, when the CPU determines that the current load of the CPU is greater than or equal to the first At a load threshold, it can be considered that the current load of the CPU is relatively large. At this time, the processing of the vertex data to be processed by the CPU will bring a greater burden to the CPU. Therefore, the load between the CPU and the GPU is comprehensively balanced. In this case, the vertex data to be processed can be handed over to the GPU for processing, so that the load of the CPU is not too high, and the load between the CPU and the GPU can be balanced as much as possible.
  • the current load of the foregoing CPU is the current total load of the CPU.
  • the current total load of the CPU may be the sum of the load of each core in the current CPU.
  • the current load of the CPU is the current load of the CPU core.
  • the current load of the CPU core may be the average value of the current load of the cores in the CPU, or the current load of any core in the CPU.
  • the load of the CPU core can be calculated according to the user mode execution time of the CPU core, the execution time of the system core, and the system idle time.
  • the current load of the CPU core can be determined according to formula (1).
  • X is the user mode execution time of the CPU core
  • Y is the system kernel execution time of the CPU core
  • Z is the sum of the user mode execution time of the CPU core, the system core execution time and the system idle time
  • P is the current load of the CPU core the amount.
  • the load of the CPU core is less than the first load threshold, it can be considered that the load of the CPU core is small. At this time, the to-be-processed vertex data can be handed over to the CPU for processing.
  • the user mode execution time of the CPU core, the execution time of the system core, and the system idle time can be referred to as time allocation information of the CPU.
  • the time allocation information of the CPU core is stored in the /proc/stat file node, and the current time allocation information of the CPU core can be obtained by querying /proc/stat.
  • the time allocation information of the CPU core can also be obtained by searching the corresponding file node.
  • Case C The current load of the GPU is greater than or equal to the second load threshold.
  • the to-be-processed vertex data can be processed by the CPU to reduce the burden on the GPU.
  • FIG. 12 shows the main processing procedure of the graphics rendering method of the embodiment of the present application.
  • the process shown in Figure 12 can be divided into the process executed by the CPU and the process executed by the GPU.
  • the process executed by the GPU mainly includes steps 501 to 503, and the process executed by the CPU mainly includes steps 601 to 604. Introduce these steps separately.
  • the GPU to be processed performs vertex processing on vertex data.
  • the vertex data can be combined into primitives according to the designated primitive type and the index data (indices data) of the vertex data to obtain primitive data.
  • the GPU performs primitive processing on the vertex data to be processed after the vertex processing.
  • the primitive processing in step 502 is mainly to perform coordinate transformation on the vertex data to be processed, and perform cropping and deletion operations on the vertex data in the crop coordinate system, so as to obtain vertex data within the user's perspective.
  • the vertex processing and primitive processing in the above steps 501 and 502 are equivalent to the processing of the vertex data to be processed in the above step 102, the difference is that the above step 102 is executed by the CPU, and the steps 501 and 502 are executed by the GPU .
  • the GPU performs other processing on the vertex data after the primitive processing.
  • the GPU may continue to perform rasterization processing, segment processing, and segment-by-segment processing on the vertex data within the user's perspective.
  • steps 501 to 503 are the main process of GPU graphics rendering. If steps 501 to 503 are performed for all vertex data, it may bring a greater burden to the GPU. Therefore, part of the vertex data can be handed over CPU processing to reduce the burden on the GPU.
  • the CPU obtains (collects) vertex data to be processed.
  • the process of obtaining the to-be-processed vertex data by the CPU in step 601 is similar to the process of obtaining the to-be-processed vertex data in step 101 above, and will not be described in detail here.
  • the CPU determines whether the acquired vertex data to be processed is transferred to the CPU for processing.
  • the CPU may determine whether the vertex data is transferred to the CPU for processing according to one or more of the number of vertex data to be processed, the current load of the CPU, and the current load of the GPU.
  • the CPU may determine to transfer the acquired vertex data to be processed to the CPU for processing when any one of the following situations occurs (case A occurs, or situation B occurs, or situation C occurs).
  • Case A The number of vertex data to be processed is greater than or equal to the first number threshold
  • Case B The current load of the CPU is less than the first load threshold
  • Case C The current load of the GPU is greater than or equal to the second load threshold.
  • the CPU may determine to transfer the acquired vertex data to be processed to the CPU for processing only when the above three situations occur. For example, the CPU may determine to transfer the acquired vertex data to be processed to the CPU for processing only when the aforementioned situation A, situation B, and situation C occur. In addition, it can also be determined to transfer the acquired vertex data to be processed to the CPU when both situation A and situation B occur (or when both situation B and situation C occur, or when both situation A and situation C occur). To process.
  • the CPU performs vertex processing on the acquired vertex data to be processed.
  • the CPU performs primitive processing on the vertex data to be processed after the vertex processing.
  • step 603 and step 604 are equivalent to the processing procedure in step 102 above.
  • the specific procedures of vertex processing and primitive processing have been described above and will not be described in detail here.
  • the CPU transfers the vertex data within the user's perspective to the GPU for processing.
  • the CPU when the CPU is used to process the vertex data to be processed, the CPU may allocate the vertex data to be processed to different cores for processing.
  • M M is a positive integer
  • the first case the CPU allocates the vertex data to be processed to a single core for processing.
  • the CPU allocates the vertex data to be processed to a single core for processing, including: the CPU allocates the vertex data to be processed to the core with the smallest current load among the M cores of the CPU for processing.
  • the load of each core in the CPU can be balanced so that the load of a certain core is not too high.
  • the vertex data to be processed can be allocated to a single core for processing.
  • the to-be-processed vertex data can be processed by a single core in the CPU.
  • the second case the CPU allocates the vertex data to be processed to N of the M cores for processing.
  • N is a positive integer greater than 1 and less than or equal to M.
  • the vertex data to be processed can be distributed to multiple cores in the CPU for processing.
  • the load of each core can be balanced. , As much as possible to avoid excessive load of a single core.
  • the current average load of the N cores is less than the current average load of the N-M cores, where the N-M cores are the other cores in the CPU except the N cores.
  • the current load of any one of the above N cores is less than the current load of any one of the N-M cores, where the N-M cores are other cores in the CPU except the N cores.
  • the vertex data to be processed can be allocated to the cores with a small current load for processing, and the load can be balanced among the cores in the CPU, so that the load of some cores Not too high.
  • the CPU obtains the vertex data (vertex data corresponding to the draw call) needed to draw a graph, and then makes a pre-judgment to determine whether the vertex data should be processed by the CPU, and when it is determined to be processed by the CPU In the case of vertex data, load distribution is performed, and the vertex data is distributed to core 1 and core 2 in the CPU for processing. Next, core 1 and core 2 respectively process the distributed vertex data.
  • the pre-judgment process performed by the CPU is to determine whether the CPU processes the vertex data.
  • the specific judgment process please refer to the relevant content of step 102a above.
  • core 1 and core 2 may be the two cores with the smallest current load in the CPU (the current load of other cores in the CPU is greater than or equal to the current load of core 1 and core 2.
  • the vertex data can be evenly distributed to core 1 and core 2 for processing.
  • the graphics rendering method of the embodiment of the present application can be applied in a game scene (rendering a video screen in a game).
  • a game scene rendering a video screen in a game.
  • FIG. 14 shows the processing procedure of the graphics rendering method of the embodiment of the present application in a game scene.
  • the method shown in FIG. 14 may be executed by an electronic device (or other electronic device capable of presenting a game screen).
  • the process shown in FIG. 14 includes steps 701 to 707, which will be described in detail below.
  • the game application calls the cross-platform graphics program interface (open graphics library for embedded systems, OPENGL ES) designed for embedded systems. Specifically, in the process of running the game (the process of drawing the game screen), the game application will continue Call the API interface in the OPENGL ES graphics library to draw the screen required by the game for display.
  • OPENGL ES open graphics library for embedded systems
  • the command stream dynamic reconstruction (CSDR) module caches the GLES graphics commands and related data of the current frame.
  • the relevant data in the above step 702 may include vertex data to be rendered.
  • calling graphics library for embedded systems (GLES) graphics instructions will be cached by the CSDR module.
  • the CPU can obtain cached GLES graphics instructions and vertex data from the CSDR module for analysis, so as to determine whether the CPU processes the vertex data.
  • the CPU collects vertex data and auxiliary data of the vertex data.
  • the CPU may obtain vertex data and auxiliary data of the vertex data from the CSDR module, where the auxiliary data includes index data of the vertex data and a transformation matrix for performing coordinate transformation on the vertex data.
  • the improvement of the embodiment of the present application compared with the existing solution is to increase the interface between the CPU and the CSDR module.
  • the CPU can process the vertex data and the CPU
  • the obtained vertex data in the user's view range is re-sent into the CSDR so that the subsequent GPU can process the vertex data in the user's time range.
  • the CPU determines whether the vertex data corresponding to the current draw call performs load transfer.
  • the vertex data corresponding to the draw call is the vertex data required to draw a graph.
  • the determination of whether the vertex data performs load transfer is essentially to determine whether the vertex data is processed by the CPU (when the CPU processes the vertex data) , Load transfer is required; when the CPU does not process the vertex data, load transfer is not required).
  • step 704 When it is determined not to perform load transfer in step 704, the vertex data is still processed by the GPU, that is, step 705 is executed. When it is determined in step 705 that load transfer is required, the vertex data is processed by the CPU, that is, steps 706 and 707 are executed. .
  • the GPU processes the vertex data.
  • step 705 the process of processing the vertex data by the GPU can refer to steps 501, 502, and 503 above.
  • the CPU processes the vertex data to obtain the vertex data of the user's viewing angle range.
  • the CPU sends the vertex data within the user's perspective to the GPU for rendering processing.
  • steps 706 and 707 For the specific processing procedures of steps 706 and 707, please refer to the related content of steps 102 and 103 above.
  • FIG. 15 shows the processing procedure of the graphics rendering method of the embodiment of the present application in a game scene.
  • the method shown in FIG. 15 may be executed by an electronic device (or other electronic device capable of presenting a game screen).
  • the process shown in FIG. 15 includes steps 801 to 804, which are respectively introduced below.
  • the CPU obtains vertex data from the GLES instruction stream.
  • the foregoing GLES instruction stream includes instructions for performing graphics rendering and parameters carried in the instructions, where the parameters include vertex data corresponding to the graphics rendering instruction. Therefore, the CPU can obtain vertex data from the GLES instruction stream.
  • the CPU makes a pre-determination to determine whether the CPU will process the vertex data.
  • the pre-determination of the CPU is mainly to determine whether the CPU processes the acquired vertex data.
  • the CPU processes the acquired vertex data please refer to the relevant content of 102a above, which will not be described in detail here.
  • step 802 When the CPU determines in step 802 that the GPU processes the vertex data, the CPU does not process the acquired vertex data. At this time, the CPU can continue to acquire the vertex data, and continue to perform step 802 after acquiring the vertex data next time. In step 802, if the CPU determines that the vertex data is processed by the CPU, the CPU continues to execute steps 803 and 804.
  • the CPU performs coordinate transformation, cropping, and culling on the vertex data to obtain vertex data within the user's perspective.
  • step 803 the specific process for the CPU to obtain the vertex data within the user's perspective can refer to the related content in step 102 above.
  • the CPU sends the vertex data within the user's viewing angle range to a graphics program interface (graphics library, GL) instruction group.
  • graphics program interface graphics library, GL
  • step 804 after obtaining the vertex data within the user's view range, the CPU may send the vertex data within the user's view range into the GL instruction group, and replace the vertex data in the GLES instruction stream.
  • the GPU can be driven by the GL user driver layer, so that the GPU can obtain the vertex data in the user's perspective, and perform subsequent rendering processing on the vertex data in the user's perspective.
  • the graphic rendering method of the embodiment of the present application is described in detail above in conjunction with FIGS. 7 to 15.
  • the graphic rendering apparatus of the embodiment of the present application is described in detail below in conjunction with FIG. 16. It should be understood that the graphics rendering device in FIG. 16 can execute each step of the graphics rendering method of the embodiment of the present application. When describing the graphics rendering device shown in FIG. 16 below, repeated descriptions are appropriately omitted.
  • FIG. 16 is a schematic block diagram of a graphics rendering apparatus according to an embodiment of the present application.
  • the device 1000 shown in FIG. 16 includes an input/output interface, a memory, and a CPU.
  • the memory is used to store programs, and when the programs stored in the memory are executed by the CPU, the CPU is specifically used to:
  • the to-be-processed vertex data is the vertex data for the GPU (the GPU can be located in the device 1000 or other devices) for graphics rendering processing;
  • the vertex data is processed to obtain the vertex data within the user's perspective; the vertex data within the user's perspective is sent to the GPU for graphics rendering processing.
  • the above device 1000 may also include a GPU.
  • the CPU in the device 1000 can obtain the vertex data to be processed in the device 1000 originally processed by the GPU, and process the vertex data to be processed to obtain the vertex data within the user's perspective, and then the user perspective
  • the vertex data in the range is sent to the GPU in the device 1000 for processing.
  • FIG. 17 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device in FIG. 17 includes a communication module 3010, a sensor 3020, a user input module 3030, an output module 3040, a processor 3050, a memory 3070, and a power supply 3080.
  • the processor 3050 may include one or more CPUs.
  • the electronic device shown in FIG. 17 can execute the steps of the graphics rendering method in the embodiments of the present application.
  • one or more CPUs in the processor 3050 can execute the steps of the graphics rendering method in the embodiments of the present application.
  • the communication module 3010 may include at least one module that enables communication between the electronic device and other electronic devices.
  • the communication module 3010 may include one or more of a wired network interface, a broadcast receiving module, a mobile communication module, a wireless Internet module, a local area communication module, and a location (or positioning) information module.
  • the communication module 3010 can obtain the game screen from the game server in real time.
  • the sensor 3020 can sense some operations of the user, and the sensor 3020 can include a distance sensor, a touch sensor, and so on.
  • the sensor 3020 can sense operations such as the user touching the screen or approaching the screen.
  • the sensor 3020 can sense some operations of the user on the game interface.
  • the user input module 3030 is used to receive input digital information, character information, or contact touch operation/non-contact gestures, and receive signal input related to user settings and function control of the system.
  • the user input module 3030 includes a touch panel and/or other input devices. For example, the user can control the game through the user input module 3030.
  • the output module 3040 includes a display panel for displaying information input by the user, information provided to the user, or various menu interfaces of the system.
  • the display panel may be configured in the form of a liquid crystal display (LCD) or an organic light-emitting diode (OLED).
  • the touch panel can cover the display panel to form a touch display screen.
  • the output module 3040 may also include a video output module, an alarm, and a haptic module.
  • the video output module can display the game screen after graphics rendering.
  • the power supply 3080 can receive external power and internal power under the control of the processor 3050, and provide power required by the various modules of the entire electronic device during operation.
  • the processor 3050 may include one or more CPUs, and the processor 3050 may also include one or more GPUs.
  • the processor 3050 includes multiple CPUs
  • the multiple CPUs may be integrated on the same chip, or may be integrated on different chips.
  • the processor 3050 includes multiple GPUs
  • the multiple GPUs may be integrated on the same chip or separately integrated on different chips.
  • the processor 3050 includes both a CPU and a GPU
  • the CPU and the GPU may be integrated on the same chip.
  • the processor of the smart phone is generally related to image processing by a CPU and a GPU. Both the CPU and GPU here can contain multiple cores.
  • the memory 3070 may store a computer program, and the computer program includes an operating system program 3072 and an application program 3071.
  • typical operating systems such as Microsoft’s Windows and Apple’s MacOS are used in desktop or notebook systems, and Google’s Android System and other systems for mobile terminals.
  • the memory 3070 may be one or more of the following types: flash memory, hard disk type memory, micro multimedia card type memory, card type memory (such as SD or XD memory), random access memory (random access memory) , RAM), static random access memory (static RAM, SRAM), read-only memory (read only memory, ROM), electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), programmable Read-only memory (programmable ROM, PROM), magnetic memory, magnetic disk or optical disk.
  • the memory 3070 may also be a network storage device on the Internet, and the system may perform operations such as updating or reading the memory 3070 on the Internet.
  • the aforementioned memory 3070 may store a computer program (the computer program is a program corresponding to the graphics rendering method in this embodiment of the application), and when the processor 3050 executes the computer program, the processor 3050 can execute the graphics in the embodiment of the application. Rendering method.
  • the memory 3070 also stores other data 3073 besides the computer program.
  • the memory 3070 may store data during the processing of the graphics rendering method of the present application.
  • connection relationship of each module in FIG. 17 is only an example.
  • the electronic device provided by any embodiment of the present application can also be applied to electronic devices of other connection modes, for example, all modules are connected through a bus.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

本申请涉及图形渲染技术领域,具体公开了一种图形渲染方法及相关装置。中央处理器CPU通过抓取图形处理器GPU指令流以获得GPU渲染图形所需的顶点数据;CPU对获得的顶点数据进行图元处理,比如坐标变换和裁剪,以得到用户视角范围内的顶点数据;CPU将用户视角范围内的顶点数据送入到图形处理器GPU中,GPU基于CPU处理后的顶点数据进行图形渲染处理。本申请提供的技术方案能够在进行图形渲染时减轻GPU的负担。

Description

图形渲染方法、装置和计算机可读存储介质
本申请要求于2019年03月26日提交中国专利局、申请号为201910231774.8、申请名称为“图形渲染方法、装置和计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,并且更具体地,涉及一种图形渲染方法、装置和计算机可读存储介质。
背景技术
图形处理器(graphics processing unit,GPU)是一种专门用于图像运算的微处理器,常被用于进行图形渲染。
传统方案在进行图形渲染时,一般是由GPU来执行图像渲染的整个过程。但是,在某些情况下,例如,图形渲染的运算量较大(如重度图形显示的情况)或者GPU需要处理的其它运算较多(如GPU在进行图形渲染的同时还参与了大型科学计算)时,采用传统方案进行图形渲染会导致GPU的负载过高,进而影响GPU进行图像渲染时的性能。
发明内容
本申请提供一种图形渲染方法、装置和计算机可读存储介质,以减轻GPU进行图形渲染时的负载。
第一方面,提供了一种图形渲染方法,该方法包括:中央处理器CPU获取待处理顶点数据,该待处理顶点数据是供图形处理器GPU进行图形渲染处理的顶点数据;CPU对待处理顶点数据进行处理,以得到用户视角范围内的顶点数据;CPU将用户视角范围内的顶点数据送入到图形处理器GPU中,以进行渲染处理。
上述待处理顶点数据可以是绘制一次图形所需要的全部顶点数据或者部分顶点数据。该待处理顶点数据除了包含用户视角范围内的顶点数据,还可以包含用户视角范围之外的顶点数据。CPU对待处理顶点数据进行处理,得到用户视角范围内的顶点数据,相当于是将待处理顶点数据中用户视角范围之外的顶点数据去除掉,从而得到位于用户视角范围内的顶点数据。
应理解,用户视角范围内的顶点数据可以是用户视角范围内可见的物体图像的顶点位置信息,通过对用户视角范围内的顶点数据的处理,能够最终得到用户视角范围内可见的物体图像。
上述CPU获取到的待处理顶点数据可以是位于本地坐标系的顶点数据。
可选地,上述待处理顶点数据为用于渲染一帧图像的一次draw call指令抓取的顶点数据。
其中,draw call指令是指图形程序接口指令,draw call指令的数目与跨平台的图形程序接口的图形描绘次数相同,draw call指令具体包括glDrawArrays、glDrawElements等指令。
当上述待处理顶点数据为用于渲染一帧图像的一次draw call指令抓取的顶点数据时,能够以draw call指令为实现对顶点数据的灵活抓取,进而能够利用CPU实现对顶点数据的灵活处理。
可选地,上述CPU获取的待处理顶点数据是存储模块中存储的部分或者全部顶点数据。
例如,存储模块当前存储有多个draw call对应的顶点数据,那么,CPU在获取顶点数据是可以从存储模块中获取一个draw call对应的顶点数据作为待处理顶点数据。
上述存储模块具体可以是双倍速率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)或者显存。具体地,当上述图形渲染方法由终端设备执行时,存储模块可以是位于终端设备内部的DDR SDRAM,当上述图形渲染方法由计算机设备执行时,存储模块可以是位于计算机设备内部的显存。
本申请中,通过将原来由GPU负责的待处理顶点数据的处理过程转移CPU中执行,能够减轻GPU进行图形渲染时的负担,可以提高图形渲染的效率。
结合第一方面,在第一方面的某些实现方式中,CPU获取待处理顶点数据,包括:CPU从存储模块获取待处理顶点数据。
本申请中,CPU能够从缓存模块中获取待处理顶点数据,从而能够处理原本由GPU处理的待处理顶点数据,可以减轻GPU的负担。
结合第一方面,在第一方面的某些实现方式中,CPU将用户视角范围内的顶点数据送入到GPU中,以进行渲染处理,包括:CPU将用户视角范围内的顶点数据存储到存储模块,以使得GPU从存储模块获取用户视角范围内的顶点数据,并进行图像渲染处理。
本申请中,CPU通过将处理得到的用户视角范围内的顶点数据存储到存储器中,使得GPU能够从该缓存模块获取用户视角范围内的顶点数据,进而完成后续的图形渲染处理。
CPU在对待处理顶点数据进行处理之前,可以从存储器中复制上述待处理顶点数据,在对待处理顶点数据处理完毕之后,再将存储器中存储的待处理顶点数据替换为用户视角范围内的顶点数据。
可选地,所述CPU获取待处理顶点数据,包括:在所述待处理顶点数据被所述GPU处理之前,所述CPU截取所述待处理顶点数据;所述CPU将所述用户视角范围内的顶点数据送入到所述GPU中,以进行图形渲染处理,包括:所述CPU将所述待处理顶点数据替换为所述用户视角范围内的顶点数据。
本申请中,CPU通过截取原本由GPU负责处理的待处理顶点数据,并将待处理顶点数据的部分处理过程转移到CPU中执行,能够减轻GPU进行图形渲染时的负担,进而提高图形渲染的效率。
结合第一方面,在第一方面的某些实现方式中,在CPU对待处理顶点数据进行处理之前,上述方法还包括:CPU根据待处理顶点数据、CPU的负载量以及GPU的负载量中的至少一种确定是否对顶点数据进行处理。
可选地,CPU可以根据待处理顶点数据的数量、CPU的负载量的大小以及GPU的负 载量的大小中的至少一种来确定对待处理顶点数据进行处理。
上述待处理顶点数据的数量可以是指待处理顶点数据的数据个数(数量),另外,上述待处理顶点数据的数量还可以是指顶点数据对应的顶点的数量。
结合第一方面,在第一方面的某些实现方式中,CPU根据待处理顶点数据、CPU的负载量以及GPU的负载量中的至少一种确定对待处理顶点数据进行处理,包括:
在下列至少一种情况发生时,CPU确定对待处理顶点数据进行处理:
待处理顶点数据的数量大于或者等于第一数量阈值;
CPU的当前负载量小于第一负载量阈值;
GPU的当前负载量大于或者等于第二负载量阈值。
本申请中,当待处理顶点数据的数量较多时,通过待处理将顶点数据交由CPU来处理,与将待处理顶点数据全部交由GPU进行处理的方式相比,能够大大减轻GPU的负担。
本申请中,当CPU的当前负载量较小时,通过将待处理顶点数据转由CPU处理,能够在CPU和GPU之间取得平衡,从而在减小GPU负载量的同时也不给CPU带来太大的负担。
本申请中,当GPU的负载量比较大时,通过将待处理顶点数据转由CPU处理,为GPU减轻负载的效果更加明显。
可选地,CPU根据待处理顶点数据、CPU的负载量以及GPU的负载量中的至少一种确定是否对顶点数据进行处理,包括:
在下列情况发生时,CPU确定不对待处理顶点数据进行处理;
待处理顶点数据的数量小于第一数量阈值;
CPU的当前负载量大于或者等于第一负载量阈值;
GPU的当前负载量小于第二负载量阈值。
本申请中,当待处理顶点数据的数量较少,CPU的负载量较大而GPU的负载量较小时,可以将待处理顶点数据仍交由GPU处理,能够简化图形渲染的处理流程。
可选地,上述CPU的当前负载量为CPU当前总的负载量。
CPU当前总的负载量可以是当前CPU中各个核的负载量的总和。
可选地,上述CPU的当前负载量为CPU核的当前负载量。
其中,CPU核的当前负载量可以是CPU中核的当前负载量的平均值,也可以是CPU中任意一个核的当前负载量。
当CPU的当前负载量为CPU中某个核的当前负载量时,CPU的当前负载量小于第一负载阈值可以是指CPU中每个核的当前负载量都小于第一负载阈值。
结合第一方面,在第一方面的某些实现方式中,待处理顶点数据为位于本地坐标系的顶点数据,CPU对待处理顶点数据进行处理,以得到用户视角范围内的顶点数据,包括:CPU根据辅助数据,对位于本地坐标系的顶点数据进行坐标转换,以得到位于裁剪坐标系的顶点数据,其中,辅助数据包括对位于本地坐标系的顶点数据进行坐标变换的变换矩阵;CPU对位于裁剪坐标系的顶点数据进行剪裁和剔除操作,以得到用户视角范围内的顶点数据。
可选地,在CPU根据辅助数据,对位于本地坐标系的顶点数据进行坐标转换之前,CPU还可以先获取辅助数据。
由于CPU需要根据辅助数据实现对本地坐标系的顶点数据进行坐标变换,因此,如果CPU在处理顶点数据之前并未获得辅助数据,那么,还可以将顶点数据交由GPU来处理。
结合第一方面,在第一方面的某些实现方式中,辅助数据包括MVP矩阵,CPU根据辅助数据,对位于本地坐标系的顶点数据进行坐标转换,以得到位于裁剪坐标系的顶点数据,包括:CPU根据MVP矩阵,对位于本地坐标系的顶点数据进行坐标转换,以得到位于裁剪坐标系的顶点数据,其中,MVP矩阵是模型矩阵、视图矩阵以及投影矩阵的乘积。
可选地,上述MVP矩阵是CPU在对位于本地坐标系的顶点数据进行坐标转换之前获取的。
本申请中,CPU根据MVP矩阵通过一次坐标变换就可以将顶点数据从本地坐标系变换到裁剪坐标系中,能够提高坐标变换的效率。
结合第一方面,在第一方面的某些实现方式中,CPU包括M个核,CPU对待处理顶点数据进行处理,包括:在待处理顶点数据的数量小于第二数量阈值时,CPU将待处理顶点数据分配到CPU中的单个核中进行处理;在待处理顶点数据的数量大于或者等于第二数量阈值时,CPU将待处理顶点数据分配到CPU中的N个核中进行处理。
其中,上述第二数量阈值大于第一数量阈值,M和N均为大于1的整数,并且N小于或者等于M。
具体地,CPU在将待处理顶点数据分配到CPU中的核进行处理时,可以单独启用一个线程通过接口来指定某些核来执行对待处理顶点数据的处理。
本申请中,能够根据待处理顶点数据的数量多少,合理的将待处理顶点数据分配到CPU中的单个核或者多个核中进行处理,使得CPU中各个核的负载尽可能的处于相对均衡的状态。
结合第一方面,在第一方面的某些实现方式中,CPU将待处理顶点数据分配到CPU中的N个核中进行处理,包括:CPU将待处理顶点数据平均分配到CPU中的N个核中进行处理。
本申请中,通过将待处理顶点数据平均分配到CPU中的多个核,能够使得每个核的负载都不至于过高,从而尽可能的避免出现CPU中的某个核的负载过高的情况发生。
结合第一方面,在第一方面的某些实现方式中,N个核的当前平均负载量小于N-M个核的当前平均负载量,其中,N-M个核为CPU中除N个核之外的其他核。
本申请中,通过将待处理顶点数据分配到CPU中当前负载量较小的几个核中进行处理,使得CPU中的某些核的负载不至于过大。
可选地,上述CPU将待处理顶点数据分配到CPU中的单个核中进行处理,包括:CPU将待处理顶点数据分配到该CPU中当前核负载最小的核中进行处理。
第二方面,提供了一种图形渲染装置,该装置包括用于执行第一方面中所描述的方法/操作/步骤/动作所对应的模块。
上述装置可以是电子设备,也可以是电子设备中的用于执行图形渲染的装置(例如,芯片,或者是能够和电子设备匹配使用的装置)。
上述图形渲染装置包括的模块可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。
第三方面,提供了一种图形渲染装置,该装置包括处理器,该处理器用于调用存储器存储的程序代码以执行上述第一方面中的任意一种方式中的部分或全部操作。
上述第三方面中,存储程序代码的存储器既可以位于图形渲染装置内部(图形渲染装置除了包括处理器之外,还可以包括存储器),也可以位于图形渲染装置外部(可以是其他设备的存储器)。
可选地,上述存储器为非易失性存储器。
当图形渲染装置包括处理器和存储器时,该处理器和存储器可以耦合在一起。
第四方面,提供了一种图形渲染装置,该装置包括:中央处理器CPU、输入输出接口和存储器。
其中,CPU可以通过输入输出接口获取待处理顶点数据,在获取到该待处理顶点数据之后,CPU对该待处理顶点数据进行处理,得到用户视角范围内的顶点数据,并将用户视角范围内的顶点数据送入到所述GPU中,以进行图形渲染处理。
上述待处理顶点数据是供GPU进行图形渲染处理的顶点数据,上GPU既可以位于该图形渲染装置内部,也可以位于图形渲染装置之外的其他设备中。
第五方面,提供了一种计算机可读存储介质,计算机可读存储介质存储了程序代码,其中,程序代码包括用于执行上述第一方面所描述的方法中的部分或全部操作的指令。
可选地,上述计算机可读存储介质位于电子设备内,该电子设备可以是能够进行图形渲染的装置。
第六方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在通信装置上运行时,使得通信装置执行上述第一方面所描述的方法中的部分或全部操作。
第七方面,提供了一种芯片,所述芯片包括处理器,所述处理器用于执行上述第一方面所描述的方法中的部分或全部操作。
附图说明
图1是本申请实施例的图形渲染方法的示意性流程图;
图2是平截头体的示意图;
图3是透视效果的示意图;
图4是裁剪体的示意图;
图5是裁剪的示意图;
图6是完全顺序分别为顺时针和逆时针的两个三角形的示意图;
图7是本申请实施例的图形渲染的示意性流程图;
图8是对顶点数据进行处理得到用户视角范围内的顶点数据的示意图;
图9是对顶点数据进行坐标变换的过程的示意图;
图10是对顶点数据进行坐标变换的过程的示意图;
图11是本申请实施例的图形渲染方法的示意性流程图;
图12是本申请实施例的图形渲染方法的示意性流程图;
图13是CPU将顶点数据分配到CPU中的核进行处理的过程的示意图;
图14是本申请实施例的图形渲染方法的示意性流程图;
图15是本申请实施例的图形渲染方法的示意性流程图;
图16是本申请实施例的图形渲染装置的示意性框图;
图17是本申请实施例的电子设备的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例中的图形渲染方法可以由电子设备来执行。该电子设备可以是移动终端(例如,智能手机),电脑,个人数字助理,可穿戴设备,车载设备,物联网设备或者其他能够进行图像渲染处理的设备。该电子设备可以是运行安卓系统、IOS系统、windows系统以及其他系统的设备。
本申请中提及的各种阈值(第一数量阈值、第二数据量阈值、第一负载量阈值和第二负载量阈值)可以是根据经验设定的,也可以是根据图形渲染时的处理的数据量的大小来综合确定的。
本申请实施例的图形渲染方法可以由电子设备执行,该电子设备的具体结构可以如图1所示,下面结合图1对电子设备的具体结构进行详细的介绍。
在一个实施例中,如图1所示,电子设备1000可以包括:中央处理器(CPU)1001、图形处理器(GPU)1002、显示设备1003和存储器1004。可选地,该电子设备10还可以包括至少一个通信总线110(图1中未示出),用于实现各个组件之间的连接通信。
应当理解,电子设备1000中的各个组件还可以通过其他连接器相耦合,其他连接器可包括各类接口、传输线或总线等。电子设备1000中的各个组件还可以是以处理器1001为中心的放射性连接方式。在本申请的各个实施例中,耦合是指通过相互电连接或连通,包括直接相连或通过其他设备间接相连。
中央处理器1001和图形处理器1002的连接方式也有多种,不局限于图1所示的方式。电子设备1000中的中央处理器1001和图形处理器1002可以位于同一个芯片上,也可以分别为独立的芯片。
下面对中央处理器1001、图形处理器1002、显示设备1003和存储器1004的作用进行简单的介绍。
中央处理器1001:用于运行操作系统1005和应用程序1007。应用程序1007可以为图形类应用程序,比如游戏、视频播放器等等。操作系统1005提供了系统图形库接口,应用程序1007通过该系统图形库接口,以及操作系统1005提供的驱动程序,比如图形库用户态驱动和/或图形库内核态驱动,生成用于渲染图形或图像帧的指令流,以及所需的相关渲染数据。其中,系统图形库包括但不限于:嵌入式开放图形库(open graphics library for embedded system,OpenGL ES)、柯罗诺斯平台图形界面(the khronos platform graphics interface)或Vulkan(一个跨平台的绘图应用程序接口)等系统图形库。指令流包含一些列的指令,这些指令通常为对系统图形库接口的调用指令。
可选地,中央处理器1001可以包括以下至少一种类型的处理器:应用处理器、一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、微控制器(microcontroller unit,MCU)或人工智能处理器等。
中央处理器1001还可进一步包括必要的硬件加速器,如专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)、 或者用于实现逻辑运算的集成电路。处理器1001可以被耦合到一个或多个数据总线,用于在电子设备10的各个组件之间传输数据和指令。
图形处理器1002:用于接收处理器1001发送的图形指令流,通过渲染管线(pipeline)生成渲染目标,并通过操作系统的图层合成显示模块将渲染目标显示到显示设备1003。
可选地,图形处理器1002可以包括执行软件的通用图形处理器,如GPU或其他类型的专用图形处理单元等。
显示设备1003:用于显示由电子设备10生成的各种图像,该图像可以为操作系统的图形用户界面(graphical user interface,GUI)或由图形处理器1002处理的图像数据(包括静止图像和视频数据)。
可选地,显示设备1003可以包括任何合适类型的显示屏。例如液晶显示器(liquid crystal display,LCD)或等离子显示器或有机发光二极管(organic light-emitting diode,OLED)显示器等。
存储器1004,是中央处理器1001和图形处理器1002之间的传输通道,可以为双倍速率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)或者其它类型的缓存。
渲染管线是图形处理器1002在渲染图形或图像帧的过程中顺序执行的一些列操作,典型的操作包括:顶点处理(Vertex Processing)、图元处理(Primitive Processing)、光栅化(Rasterization)、片段处理(Fragment Processing)等等。
在本申请实施例的图形渲染方法中,会涉及到顶点数据的坐标系转换,以及顶点数据的裁剪和剔除操作,下面先对相关的基本概念进行简单的介绍。
在对顶点数据处理过程中,会涉及到5种不同的坐标系统。
局部空间(Local Space,或者称为物体空间(Object Space));
世界空间(World Space);
观察空间(View Space,或者称为视觉空间(Eye Space));
裁剪空间(Clip Space);
屏幕空间(Screen Space)。
为了将(顶点数据的)坐标从一个坐标系变换到另一个坐标系,一般需要用到几个变换矩阵,最重要的几个变换矩阵分别是模型(Model)、观察(View)、投影(Projection)三个矩阵。顶点数据的坐标一般起始于局部空间(Local Space),在这里将局部空间的坐标称为局部坐标(Local Coordinate),局部坐标经过变换之后会变依次变为世界坐标(World Coordinate),观察坐标(View Coordinate),裁剪坐标(Clip Coordinate),并最后以屏幕坐标(Screen Coordinate)的形式结束。
在上述坐标变换过程中,局部坐标是对象相对于局部原点的坐标,也是物体起始的坐标。接下来,是将局部坐标变换为世界空间坐标,世界空间坐标是处于一个更大的空间范围的。这些坐标相对于世界的全局原点,它们会和其它物体一起相对于世界的原点进行摆放。然后将世界坐标变换为观察空间坐标,使得每个坐标都是从摄像机或者说观察者的角度进行观察的。当顶点坐标到达观察空间之后,我们需要将其投影到裁剪坐标。裁剪坐标会被处理至-1.0到1.0的范围内,并判断哪些顶点将会出现在屏幕上。最后,再将裁剪坐标变换为屏幕坐标,接下来将使用一个叫做视口变换(Viewport Transform)的过程。视 口变换将位于-1.0到1.0范围的坐标变换到由glViewport函数所定义的坐标范围内。最后变换出来的坐标将会送到光栅器,将其转化为片段(转换为片段之后,就可以根据该片段进行视频图像的显示了)。
在上述过程中,之所以将顶点变换到各个不同的空间是因为有些操作在特定的坐标系统中才有意义且更方便。例如,当需要对物体进行修改的时候,在局部空间中来操作会更说得通;如果要对一个物体做出一个相对于其它物体位置的操作时,在世界坐标系中来做这个才更说得通,等等。如果我们愿意,我们也可以定义一个直接从局部空间变换到裁剪空间的变换矩阵,但那样会失去很多灵活性。
接下来对各个坐标系统进行详细的介绍。
局部空间:
局部空间是指物体所在的坐标空间,即对象最开始所在的地方。例如你在一个建模软件(比如说Blender)中创建了一个立方体。你创建的立方体的原点有可能位于(0,0,0),即便它有可能最后在程序中处于完全不同的位置。甚至有可能创建的所有模型都以(0,0,0)为初始位置(然而它们会最终出现在世界的不同位置)。所以,该创建的模型的所有顶点都是在局部空间中:它们相对于你的物体来说都是局部的。
世界空间:
如果我们将我们所有的物体导入到程序当中,它们有可能会全挤在世界的原点(0,0,0)上,这并不是我们想要的结果。我们想为每一个物体定义一个位置,从而能在更大的世界当中放置它们。世界空间中的坐标正如其名:是指顶点相对于(游戏)世界的坐标。如果你希望将物体分散在世界上摆放(特别是非常真实的那样),这就是你希望物体变换到的空间。物体的坐标将会从局部变换到世界空间;该变换是由模型矩阵(Model Matrix)实现的。
模型矩阵是一种变换矩阵,它能通过对物体进行位移、缩放、旋转来将它置于它本应该在的位置或朝向。例如,如果要变换一个房子,你需要先将它缩小(它在局部空间中太大了),并将其位移至郊区的一个小镇,然后在y轴上往左旋转一点以搭配附近的房子。
观察空间:
观察空间经常被人们称之跨平台的图形程序接口(open graphics library,OPENGL)的摄像机(有时也称为摄像机空间(Camera Space)或视觉空间(Eye Space))。观察空间是将世界空间坐标转化为用户视野前方的坐标而产生的结果。因此观察空间就是从摄像机的视角所观察到的空间。而这通常是由一系列的位移和旋转的组合来完成,平移/旋转场景从而使得特定的对象被变换到摄像机的前方。这些组合在一起的变换通常存储在一个观察矩阵(View Matrix)里,它被用来将世界坐标变换到观察空间。
裁剪空间:
在一个顶点着色器运行的最后,OPENGL期望所有的坐标都能落在一个特定的范围内,且任何在这个范围之外的点都应该被裁剪掉(Clipped)。被裁剪掉的坐标就会被忽略,所以剩下的坐标就将变为屏幕上可见的片段。这也就是裁剪空间(Clip Space)名字的由来。
因为将所有可见的坐标都指定在-1.0到1.0的范围内不是很直观,所以我们会指定自己的坐标集(Coordinate Set)并将它变换回标准化设备坐标系,就像OPENGL期望的那 样。
为了将顶点坐标从观察变换到裁剪空间,我们需要定义一个投影矩阵(Projection Matrix),它指定了一个范围的坐标,比如在每个维度上的-1000到1000。投影矩阵接着会将在这个指定的范围内的坐标变换为标准化设备坐标的范围(-1.0,1.0)。所有在范围外的坐标不会被映射到在-1.0到1.0的范围之间,所以会被裁剪掉。在上面这个投影矩阵所指定的范围内,坐标(1250,500,750)将是不可见的,这是由于它的x坐标超出了范围,它被转化为一个大于1.0的标准化设备坐标,所以被裁剪掉了。
例如,如果只是图元(Primitive),例如三角形,的一部分超出了裁剪体积(Clipping Volume),则OpenGL会重新构建这个三角形为一个或多个三角形让其能够适合这个裁剪范围。
坐标变换过程可能会涉及到正射投影和透视投影,下面对这两种投影方式进行详细的介绍。
正射投影:
正射投影矩阵定义了一个类似立方体的平截头箱,它定义了一个裁剪空间,在这空间之外的顶点都会被裁剪掉。创建一个正射投影矩阵需要指定可见平截头体的宽、高和长度。在使用正射投影矩阵变换至裁剪空间之后处于这个平截头体内的所有坐标将不会被裁剪掉。它的平截头体看起来像一个容器:
如图2所示,平截头体定义了可见的坐标,它由宽、高、近(Near)平面和远(Far)平面所指定。任何出现在近平面之前或远平面之后的坐标都会被裁剪掉。正射平截头体直接将平截头体内部的所有坐标映射为标准化设备坐标,因为每个向量的w分量都没有进行改变;如果w分量等于1.0,透视除法则不会改变这个坐标。
要创建一个正射投影矩阵,我们可以使用GLM的内置函数glm::ortho:
glm::ortho(0.0f,800.0f,0.0f,600.0f,0.1f,100.0f);
前两个参数指定了平截头体的左右坐标,第三和第四参数指定了平截头体的底部和顶部。通过这四个参数我们定义了近平面和远平面的大小,然后第五和第六个参数则定义了近平面和远平面的距离。这个投影矩阵会将处于这些x,y,z值范围内的坐标变换为标准化设备坐标。
正射投影矩阵直接将坐标映射到2D平面中,即你的屏幕,但实际上一个直接的投影矩阵会产生不真实的结果,因为这个投影没有将透视(Perspective)考虑进去。所以我们需要透视投影矩阵来解决这个问题。
在图3中,由于透视,这两条线在很远的地方看起来会相交。这正是透视投影想要模仿的效果,它是使用透视投影矩阵来完成的。这个投影矩阵将给定的平截头体范围映射到裁剪空间,除此之外还修改了每个顶点坐标的w值,从而使得离观察者越远的顶点坐标w分量越大。被变换到裁剪空间的坐标都会在-w到w的范围之间(任何大于这个范围的坐标都会被裁剪掉)。OPENGL要求所有可见的坐标都落在-1.0到1.0范围内,作为顶点着色器最后的输出,因此,一旦坐标在裁剪空间内之后,透视除法就会被应用到裁剪空间坐标上,通过透视除法得到的坐标如下面的公式所示:
Figure PCTCN2020080582-appb-000001
顶点坐标的每个分量都会除以它的w分量,距离观察者越远顶点坐标就会越小。这是也是w分量非常重要的另一个原因,它能够帮助我们进行透视投影。最后的结果坐标就是处于标准化设备空间中的。
在GLM中可以这样创建一个透视投影矩阵:
glm::mat4proj=glm::perspective(glm::radians(45.0f),(float)width/(float)height,0.1f,100.0f);
上述透视投影矩阵的第一个参数定义了fov的值,它表示的是视野(Field of View),并且设置了观察空间的大小。如果想要一个真实的观察效果,它的值通常设置为45.0f,但想要一个末日风格的结果你可以将其设置一个更大的值。第二个参数设置了宽高比,由视口的宽除以高所得。第三和第四个参数设置了平截头体的近和远平面。我们通常设置近距离为0.1f,而远距离设为100.0f。所有在近平面和远平面内且处于平截头体内的顶点都会被渲染。
同样,glm::perspective所做的其实就是创建了一个定义了可视空间的大平截头体,任何在这个平截头体以外的东西最后都不会出现在裁剪空间体积内,并且将会受到裁剪。一个透视平截头体可以被看作一个不均匀形状的箱子,在这个箱子内部的每个坐标都会被映射到裁剪空间上的一个点。
在本申请中,可以将顶点数据进行下列坐标变换:本地坐标系->世界坐标系->观察者坐标系->裁剪坐标系,之后再进行裁剪操作,CPU侧执行的可以是简化的裁剪操作:在裁剪空间(x,y,z,w)中定义的顶点坐标根据视椎体(裁剪体)裁剪。
如图4所示,裁剪体由6个裁剪平面定义,这些平面称作近、远、左、右、上、下裁剪平面。在裁剪坐标中,裁剪体的坐标取值范围如下:
-w<=x<=w
-w<=y<=w
-w<=z<=w
对于裁剪体,可以按照以下的裁剪规则进行裁剪:
Z轴的裁剪:-w<=z<=w,w>0,将在范围之外的顶点裁剪掉;
XY轴线段裁剪:可以只对完全不可见的物体进行裁剪;
背面裁剪:根据图元的法向量,将面向背面的图元裁剪掉。
在完成了顶点数据的裁剪之后,可以把裁剪后顶点数据更新到顶点数据中。具体地,在裁剪完成之后,可以更新对应的顶点(vertex)数据和索引(indices)数据,并把这些数据作为draw call指令的数据输入,送入GPU的渲染管线(pipeline)。
在本申请实施例中,可以采用一种保守的裁剪方式(例如,简化的Cohen-Sutherland算法)进行裁剪,如图5所示,采用该裁剪方式AE线段不会被裁剪或者截断生成新的顶点,以简体计算量。
剔除操作主要是抛弃背向观看者的三角形,要确定三角形是正面还是背面,首先需要知道它的方向。三角形的方向指定从第一个顶点开始,经过第二个和第三个顶点,最后回 到第一个顶点的弯曲方向或者路径顺序。如图6展示了弯曲顺序为顺时针和逆时针的两个三角形示例,假设逆时针的三角形为面向观察者的三角形,顺时针的三角形为背向观察者的三角形(具体哪个方向三角形是面向观察者的三角形可以预先根据程序指令(例如,open gl指令)来灵活设置),那么,在进行剔除操作时可以保留顺时针的三角形,而将逆时针的三角形剔除掉。
图7是本申请实施例的图形渲染方法的示意性流程图。图7所示的方法可以由电子设备来执行,图7所示的方法包括步骤101至103,下面分别对这些步骤进行详细的介绍。
101、中央处理器(central processing unit,CPU)获取待处理顶点数据。
上述CPU可以是位于电子设备内部。上述待处理顶点数据可以是CPU从(电子设备的)缓存模块(缓存单元)中获取到的顶点数据,该待处理顶点数据是供GPU进行图形渲染处理的顶点数据。
上述待处理顶点数据可以是绘制一次图形所需要的全部顶点数据或者部分顶点数据。该待处理顶点数据除了包含用户视角范围内的顶点数据,还可以包含用户视角范围之外的顶点数据。CPU对待处理顶点数据进行处理,得到用户视角范围内的顶点数据,相当于是将待处理顶点数据中用户视角范围之外的顶点数据去除掉,从而得到位于用户视角范围内的顶点数据。
应理解,用户视角范围内的顶点数据可以是用户视角范围内可见的物体图像的顶点位置信息,通过对用户视角范围内的顶点数据的处理,能够最终得到用户视角范围内可见的物体图像。
可选地,上述CPU获取待处理顶点数据,包括:CPU从存储模块获取待处理顶点数据。
其中,上述存储模块缓存了供GPU进行图形渲染处理的待处理顶点数据。
当上述图形渲染方法由终端设备执行时,存储模块可以是位于终端设备内部的DDR SDRAM,当上述图形渲染方法由计算机设备执行时,存储模块可以是位于计算机设备内部的显存。
本申请中,CPU也能够从存储模块中获取待处理顶点数据,使得CPU也能够实现对待处理顶点数据的处理,进而减轻GPU的负担。
另外,上述CPU获取的待处理顶点数据可以是某次draw call获取到的顶点数据,其中,draw call获取到的顶点数据可以是指绘制一次图形所需要的顶点数据。
其中,draw call指令是指图形程序接口指令,draw call指令的数目与跨平台的图形程序接口的图形描绘次数相同,draw call指令具体包括glDrawArrays、glDrawElements等指令。
当上述待处理顶点数据为用于渲染一帧图像的一次draw call指令抓取的顶点数据时,能够以draw call指令为实现对顶点数据的灵活抓取,进而能够利用CPU实现对顶点数据的灵活处理。
102、CPU对上述待处理顶点数据进行处理,得到用户视角范围内的顶点数据。
应理解,上述步骤101中获取得到的待处理顶点数据可以是位于本地坐标系(也可以称为局部坐标系,英文名称为local space)的顶点数据。因此,在步骤102中CPU实际上是对位于本地坐标系的顶点数据进行处理,以得到用户视角范围内的顶点数据。
其中,本地坐标是渲染对象相对于物体原点的坐标,也是物体起始的坐标。当需要对物体进行修改的时候,在本地空间中来操作会更说得通。
可选地,上述步骤102中对待处理顶点数据进行处理,得到用户视角范围内的顶点数据,具体包括:CPU根据辅助数据,对位于本地坐标系的顶点数据进行坐标转换,以得到位于裁剪坐标系的顶点数据;CPU对位于裁剪坐标系的顶点数据进行剪裁和剔除操作,以得到用户视角范围内的顶点数据。
其中,上述辅助数据包括对位于本地坐标系的顶点数据进行坐标变换的变换矩阵。
通过将本地坐标系的顶点数据转换到裁剪坐标系中,能够使得本地坐标系的顶点数据的在裁剪坐标系中的裁剪坐标变换到[-0.1,0.1]的范围内,便于后续判断哪些顶点将会出现在屏幕上。
裁剪操作是在裁剪坐标系(裁剪空间)中进行的,其中,位于用户视角范围内的图元可以称为裁剪体,裁剪体由6个裁剪平面定义,这些平面可以称作近、远、左、右、上、下裁剪平面,裁剪体的定义可以为:-w<=x<=w,-w<=y<=w,-w<=z<=w。在裁剪时可以将裁剪体之外的图元删除掉。
剔除操作主要是抛弃背向观看者的图元,要确定图元是正面还是背面,首先需要知道它的方向。以三角形图元为例,三角形的方向指定从第一个顶点开始,经过第二个和第三个顶点,最后回到第一个顶点的弯曲方向或者路径顺序。例如,弯曲顺序方向为顺时针方向的三角形图元可以是正向观看者的图元,弯曲方向为逆时针方向的三角形图元可以是背向观看者的图元,这种图元需要剔除掉。
为了更好的理解上述坐标变换的过程,下面结合图8对坐标变换的过程进行说明。
如图8所示,在获取到待处理顶点数据之后,通过步骤201至203能够得到用户视角范围内的顶点数据。下面对图8所示的处理过程进行简单的介绍。
201、将待处理顶点数据从本地坐标系变换到裁剪坐标系,以得到位于裁剪坐标系的顶点数据。
CPU刚获取到的待处理顶点数据是位于本地坐标系的数据,需要将该待处理顶点数据从本地坐标系变换到裁剪坐标系才能进行后续的裁剪和剔除。
202、对位于裁剪坐标系的顶点数据进行剪裁和剔除操作,以得到用户视角范围内的顶点数据。
应理解,在201中既可以直接通过一次坐标变换,将顶点数据从本地坐标系变换到裁剪坐标系,也可以是通过多次坐标变换,将顶点数据从本地坐标系变换到裁剪坐标系。
本申请中,在对位于本地坐标系的顶点数据进行坐标转换时可以采用不同的坐标变换方式,下面对其中两种可能的变换方式进行介绍。
第一种处理方式:根据辅助数据对位于本地坐标系的顶点数据依次进行坐标变换,得到位于裁剪坐标系中的顶点数据。
在第一种方式下,CPU对本地坐标系中的顶点进行坐标变换的具体过程包括:
(1)将位于本地坐标系的顶点数据从本地坐标系依次变换到世界坐标系、观察者坐标系和裁剪坐标系,得到位于裁剪坐标系的顶点数据;
(2)对位于裁剪坐标系的顶点数据进行裁剪和剔除操作,得到用户视角范围内的顶点数据。
在第一种方式下,辅助数据可以包括模型矩阵、视图矩阵(或者也可以称为观察者矩阵)以及投影矩阵,这些矩阵是与CPU获得的顶点数据相匹配的矩阵,通过这些矩阵能够将CPU获取到的位于本地坐标系的顶点数据变换到裁剪坐标系中。
其中,顶点数据位于世界坐标系中的坐标又可以称为世界空间坐标,顶点数据的世界空间坐标是顶点数据相对于世界原点的空间坐标,是处于一个更大空间范围的坐标。处于世界坐标系统中的顶点数据会和其它物体一起相对于世界的原点进行摆放。如果要对顶点数据做出一个相对于其它物体位置的操作时,在世界坐标系中来做才更说得通。
顶点数据位于观察者坐标系中的坐标可以称为观察者空间坐标,观察者空间坐标是从摄像机或者观察者的角度进行观察得到的坐标。
为了更形象的说明第一种方式下的坐标变换过程,下面结合图9对第一种方式下的坐标变换进行介绍。
如图9所示,在获取到顶点数据之后,通过步骤301至303能够得到用户视角范围内的顶点数据。下面对图9所示的处理过程进行简单的介绍。
301、将待处理顶点数据从在本地坐标系变换到世界坐标系,得到位于世界坐标系的顶点数据。
在步骤301中,可以将位于本地坐标系的顶点数据与模型矩阵相乘,得到位于世界坐标系的顶点数据。
302、将待处理顶点数据从世界坐标系变换到观察者坐标系,得到位于观察者坐标系的顶点数据。
在步骤302中,可以将位于世界坐标系的顶点数据与视图矩阵相乘,得到位于观察者坐标系的顶点数据。
303、将待处理顶点数据从观察者坐标系变换到裁剪坐标系,得到位于裁剪坐标系的顶点数据。
在步骤303中,可以将位于观察者坐标系的顶点数据与投影矩阵相乘,得到位于裁剪坐标系的顶点数据。
第二种处理方式:根据辅助数据对位于本地坐标系的顶点数据进行一次变换,得到位于裁剪坐标系中的顶点数据。
将图元数据与模型、视图和投影矩阵(model view projection matrix,MVP)相乘,得到位于裁剪坐标系中的图元数据。
在第二种方式下,CPU对本地坐标系中的顶点数据进行坐标变换的具体过程包括:
(3)将位于本地坐标系的顶点数据与模型、视图和投影矩阵(model view projection matrix,MVP)相乘,得到位于裁剪坐标系中的顶点数据;
(4)对位于裁剪坐标系中的图元数据进行裁剪和剔除操作,得到用户视角范围内的顶点数据。
在第二种方式下,辅助数据可以包括MVP,其中,MVP是模型矩阵、视图矩阵以及投影矩阵依次相乘得到的矩阵。
其中,得到MVP的模型矩阵、视图矩阵以及投影矩阵是与CPU获得的顶点数据相匹配的矩阵。
为了更形象的说明第二种方式下的坐标变换过程,下面结合图10对第二种方式下的 坐标变换进行介绍。
如图10所示,在获取到顶点数据之后,通过步骤401就能直接将顶点数据从本地坐标系变换到裁剪坐标系。下面对图10所示的处理过程进行简单的介绍。
401、将位于本地坐标系的顶点数据的坐标与MVP矩阵相乘,得到位于裁剪坐标系的顶点数据。
本申请中,当采用第二种处理方式进行坐标变换时,能够根据MVP矩阵通过一次坐标变换就可以将顶点数据从本地坐标系变换到裁剪坐标系中,能够提高坐标变换的效率。
在上述第二种处理方式下,MVP矩阵可以是CPU预先得到的。具体地,该MVP矩阵可以是CPU在对顶点数据进行坐标变换之前获取到的。这样在根据MVP矩阵进行坐标变换时,能够节省坐标变换所需要的时间。
应理解,在对上述位于本地坐标系的顶点数据进行坐标转换之前,还可以先对上述位于本地坐标系的顶点数据进行顶点处理。
对上述位于本地坐标系的顶点数据进行顶点处理具体可以包括:根据指定的图元类型以及顶点数据的索引数据(indices数据),将顶点数据组合成图元,得到图元数据。
在得到图元数据之后,就完成了顶点处理,接下来,可以继续对图元数据进行图元处理,以得到用户视角范围内的顶点数据。
另外,在对上述图元数据进行图元处理之前,还可以通过顶点着色器对图元数据中的顶点位置进行光照变换,然后再对处理后的图元数据进行图元处理。
应理解,在上述进行顶点处理的时,最终得到的可以是图元数据,图元数据可以看成是按照一定的形状组合的顶点数据。因此,图元数据实质上也可以看成顶点数据,在后续对图元数据进行图元处理时,实质上是对顶点处理之后的顶点数据进行的处理。
应理解,在本申请中,在对顶点数据进行顶点处理以及进行后续的坐标变换时都需要采用辅助数据,其中,在对顶点数据进行顶点处理时采用的辅助数据为顶点数据的索引数据(indices数据),在进行坐标变换时采用的是辅助数据可以称为坐标变换矩阵(uniform数据)。
103、CPU将用户视角范围内的顶点数据送入到GPU中,以进行渲染处理。
上述CPU和GPU均可以包含多个核。
可选地,上述CPU和GPU既可以位于同一电子设备中,也可以分别位于不同的电子设备中。
例如,CPU和GPU均位于同一个电子设备中,通过CPU和GPU的配合能够实现对图形的渲染。
再如,CPU位于客户端设备(例如,终端设备)中,GPU位于云端设备(如云端的服务器)中,通过客户端设备的CPU与云端设备的配合,能够实现对图形的渲染。其中,客户端设备中的CPU可以先获取顶点数据,并对顶点数据进行处理,然后再将最终得到的用户视角范围内的顶点数据送入到GPU中进行渲染处理,接下来,客户端设备可以从云端设备获取渲染处理后的图形进行显示。
本申请中,通过将原来由GPU负责的待处理顶点数据的处理过程转移到CPU中执行,能够减轻GPU进行图形渲染时的负担,可以提高图形渲染的效率。
可选地,CPU将用户视角范围内的顶点数据送入到GPU中,以进行渲染处理,包括: CPU将用户视角范围内的顶点数据存储到存储模块,以使得GPU从存储模块获取用户视角范围内的顶点数据,并进行图像渲染处理。
本申请中,CPU通过将处理得到的用户视角范围内的顶点数据存储到存储器中,使得GPU能够从该缓存模块获取用户视角范围内的顶点数据,进而完成后续的图形渲染处理。
CPU在对待处理顶点数据进行处理之前,可以从存储器中复制上述待处理顶点数据,在对顶点数据处理完毕之后,再将存储器中存储的待处理顶点数据替换为用户视角范围内的顶点数据。
可选地,作为一个实施例,如图11所示,在上述步骤102之前,图7所示的方法还包括步骤102a,下面对步骤102a进行描述。
102a、CPU确定是否对待处理顶点数据进行处理。
应理解,在步骤102a中,CPU是要确定CPU是否要对待处理顶点数据进行处理。
具体地,在步骤102a中,CPU可以根据待处理顶点数据、CPU的负载量以及GPU的负载量中的至少一种确定是否对顶点数据进行处理。
当步骤102a中CPU确定对待处理顶点数据进行处理之后继续执行步骤102和103,而当步骤102a中CPU确定不对待处理顶点数据进行处理之后,可以由GPU继续对待处理顶点数据进行处理。
应理解,CPU可以根据待处理顶点数据的数量、CPU的负载量的大小以及GPU的负载量的大小中的至少一种来确定是否对待处理顶点数据进行处理。
上述待处理顶点数据的数量可以是指待处理顶点数据的数据个数(数量),另外,上述待处理顶点数据的数量还可以是指顶点数据对应的顶点的数量。
具体地,步骤102a中,CPU确定根据待处理顶点数据、CPU的负载量以及GPU的负载量中的至少一种确定是否对待处理顶点数据进行处理,包括:
在下列至少一种情况发生时,所述CPU确定对待处理顶点数据进行处理;
情况A:待处理顶点数据的数量大于或者等于第一数量阈值;
情况B:CPU的当前负载量小于第一负载量阈值;
情况C:GPU的当前负载量大于或者等于第二负载量阈值。
本申请中,当待处理顶点数据的数量较多时,通过将待处理顶点数据交由CPU来处理,与将待处理顶点数据全部交由CPU来处理的方式相比,能够大大减轻GPU的负担。
下面对上述三种情况进行详细介绍。
情况A:待处理顶点数据的数量大于或者等于第一数量阈值;
当待处理顶点数据的数量较多时,如果直接将待处理顶点数据交由GPU处理的话可能会给GPU带来较大的负担(待处理顶点数据的数量越大,对应的运算量一般也越大),因此,当待处理顶点数据的数量较多时,通过将待处理顶点数据的处理转移到CPU进行处理,能够大大减轻GPU的负担,为GPU减轻负担(或者负载量)的效果更加明显。
应理解,当上述待处理顶点数据的数量大于或者等于第一数量阈值时,可以认为待处理顶点数据的数量(相对)较多,此时为了减轻GPU的负担,可以将待处理顶点数据交由GPU处理。
而当待处理顶点数据的数量小于第一数量阈值时,可以认为待处理顶点数据的数量(相对)较少,此时将待处理顶点数据直接交由GPU来处理,一般不会给GPU带来较大 的负担,此时可以由GPU来执行对待处理顶点数据的处理。
情况B:CPU的当前负载量小于第一负载量阈值;
当CPU的当前负载量小于第一负载量阈值时,可以认为CPU当前的负载量较小,可以将待处理顶点数据交由CPU来处理;另外,当CPU确定CPU当前的负载量大于或者等于第一负载量阈值时,可以认为CPU当前的负载量较大,此时再将待处理顶点数据交由CPU处理的话会给CPU带来较大的负担,因此,综合平衡CPU和GPU之间的负载情况,这种情况下可以将待处理顶点数据交由GPU处理,使得CPU的负载也不至于过高,尽可能的使得CPU和GPU之间的负载达到均衡状态。
可选地,上述CPU的当前负载量为CPU当前总的负载量。
CPU当前总的负载量可以是当前CPU中各个核的负载量的总和。
可选地,上述CPU的当前负载量为CPU核的当前负载量。
其中,CPU核的当前负载量可以是CPU中核的当前负载量的平均值,也可以是CPU中任意一个核的当前负载量。
具体地,在确定CPU核的当前的负载量时,可以根据CPU核的用户态执行时间、系统内核执行时间以及系统空闲时间来计算CPU核的负载量。
例如,可以根据公式(1)来确定CPU核的当前负载量。
P=(X+Y)/Z(1)
其中,X为CPU核的用户态执行时间,Y为CPU核的系统内核执行时间,Z为CPU核的用户态执行时间、系统内核执行时间以及系统空闲时间的总和,P为CPU核的当前负载量。
当CPU核的负载量小于第一负载量阈值时,可以认为CPU核的负载量较小,此时,可以将待处理顶点数据交由CPU处理。
上述CPU核的用户态执行时间、系统内核执行时间以及系统空闲时间可以称为CPU的时间分配信息。对于安卓(android)系统来说,CPU核的时间分配信息保存在/proc/stat文件节点中,通过查询/proc/stat可以获取当前CPU核的时间分配信息。同理,对于IOS系统来说,也可以通过查找相应的文件节点来获取CPU核的时间分配信息。
情况C:GPU的当前负载量大于或者等于第二负载量阈值。
当GPU的当前负载量大于第二负载阈值时,可以认为GPU当前的负载量较大,此时可以将待处理顶点数据交由CPU来处理,以减轻GPU的负担。
为了更好地理解本申请实施例的图形渲染方法,下面结合图12对本申请实施例的图形渲染方法进行介绍。
图12示出了本申请实施例的图形渲染方法的主要处理过程。
按照执行主体的不同,图12所示的过程可以分为CPU执行的过程以及GPU执行的过程,其中,GPU执行的过程主要包括步骤501至503,CPU执行的过程主要包括步骤601至604,下面分别对这些步骤进行介绍。
501、GPU待处理对顶点数据进行顶点处理。
上述步骤501中,可以根据指定的图元类型以及顶点数据的索引数据(indices数据),将顶点数据组合成图元,得到图元数据。
502、GPU对顶点处理后的待处理顶点数据进行图元处理。
步骤502中进行图元处理主要是对待处理顶点数据进行坐标变换,并对裁剪坐标系中的顶点数据进行裁剪和删除操作,从而得到位于用户视角范围内的顶点数据。
上述步骤501和502中进行顶点处理和图元处理相当于上文步骤102中对待处理顶点数据进行处理,其区别在于,上述步骤102是由CPU来执行,而步骤501和502则由GPU来执行。
503、GPU对图元处理后的顶点数据进行其他处理。
在步骤503中,GPU可以对用户视角范围内的顶点数据继续进行光栅化处理、片段处理和逐片段等处理。
上述步骤501至503是GPU进行图形渲染的主要过程,如果对于所有的顶点数据都执行步骤501至503的话,可能会给GPU带来较大的负担,因此,可以将其中的一部分顶点数据交由CPU处理,以减轻GPU的负担。
601、CPU获取(收集)待处理顶点数据。
步骤601中CPU获取待处理顶点数据的过程与上文中的步骤101中获取待处理顶点数据的过程类似,这里不再详细描述。
602、CPU确定获取到的待处理顶点数据是否转移到CPU中进行处理。
具体地,CPU可以根据待处理顶点数据的数量,CPU当前的负载量以及GPU当前的负载量中的一个多个因素来确定顶点数据是否转移到CPU中进行处理。
具体地,CPU可以在下列任意一种情况发生时(情况A发生,或者情况B发生,或者情况C发生)确定将获取到的待处理顶点数据转移到CPU中进行处理。
情况A:待处理顶点数据的数量大于或者等于第一数量阈值;
情况B:CPU的当前负载量小于第一负载量阈值;
情况C:GPU的当前负载量大于或者等于第二负载量阈值。
另外,CPU可以在上述三种情况均发生时才确定将获取到的待处理顶点数据转移到CPU中进行处理。例如,CPU可以在上述情况A、情况B和情况C均发生时才确定将获取到的待处理顶点数据转移到CPU中进行处理。另外,还可以在情况A和情况B均发生时(或者是情况B和情况C均发生时,或者是情况A和情况C均发生时)才确定将获取到的待处理顶点数据转移到CPU中进行处理。
603、CPU对获取到的待处理顶点数据进行顶点处理。
604、CPU对顶点处理后的待处理顶点数据进行图元处理。
上述步骤603和步骤604相当于上文中的步骤102中的处理过程,顶点处理和图元处理的具体过程上文中已经描述,这里不再详细描述。
605、CPU将用户视角范围内的顶点数据转移到GPU中进行处理。
本申请中,当采用CPU对待处理顶点数据进行处理时,CPU可以将待处理顶点数据分配到不同的核中进行处理。下面以CPU中存在M(M为正整数)个核为例,对CPU中的核处理待处理顶点数据的各种情况进行详细说明。
第一种情况:CPU将待处理顶点数据分配到单个核中进行处理。
在第一种情况下,CPU将待处理顶点数据分配到单个核中进行处理,包括:CPU将待处理顶点数据分配到CPU的M个核中当前负载量最小的核中进行处理。
通过将待处理顶点数据分配到CPU中当前负载量最小的核中进行处理,能够平衡 CPU中的各个核的负载,使得某个核的负载不至于过高。
另外,在第一种情况下,还可以在待处理顶点数据的数量小于第二数量阈值时,再将待处理顶点数据分配到单个核中进行处理。
当待处理顶点数据的数量小于第二数量阈值时,可以认为待处理顶点数据的数量不是特别多,此时,将待处理顶点数据交给CPU中的单个核中进行处理即可。
第二种情况:CPU将待处理顶点数据分配到M个核中的N个核中进行处理。
在第二种情况下,N为大于1且小于或者等于M的正整数。
在第二种情况下,可以将待处理顶点数据分配到CPU中的多个核中进行处理,通过将待处理顶点数据分配到CPU中的多个核中进行处理,能够平衡各个核的负载量,尽可能的避免单个核的负载量过大。
另外,在第二种情况下,N个核的当前平均负载量小于N-M个核的当前平均负载量,其中,N-M个核为CPU中除N个核之外的其他核。
上述N个核中的任意一个核的当前负载量均小于N-M个核中的任意一个核的当前负载量,其中,N-M个核为CPU中除N个核之外的其他核。
也就是说,在第二种情况下,可以将待处理顶点数据分配到当前负载量较小的核中进行处理,能够在CPU中的各个核之间实现负载的均衡,使得某些核的负载不至于过高。
为了更形象的理解CPU将待处理顶点数据分配到多个核中进行处理的过程,下面结合图13以CPU将待处理顶点数据分配到两个核中进行处理为例,对CPU将待处理顶点数据分配到多个核中进行处理的过程进行说明。
如图13所示,CPU获取绘制一次图形所需要的顶点数据(draw call对应的顶点数据),接下来进行预判断,确定是否要由CPU对顶点数据进行处理,并在确定由CPU来进行处理顶点数据的情况下进行负载分配,将顶点数据分配给CPU中的核1和核2进行处理,接下来,核1和核2分别对分配到顶点数据进行处理。
应理解,在图13中,CPU在获取绘制一次图形所需要的顶点数据时,为了实现对顶点数据的处理还需要获取顶点数据的索引数据和变换矩阵。
在图13中,CPU进行预判断的过程是为了确定CPU是否对顶点数据进行处理,具体的判断过程可以参见上文中步骤102a的相关内容。
另外,在图13所示的过程中,核1和核2可以是CPU中当前负载量最小的两个核(CPU中其他核的当前的负载量均大于或者等于核1和核2当前的负载量),CPU将顶点数据分配给核1和核2处理时,可以将顶点数据平均分配到核1和核2中进行处理。
本申请实施例的图形渲染方法可以应用在游戏场景中(对游戏中的视频画面进行渲染),为了更好地理解本申请实施例的图形渲染方法,下面结合附图,以终端设备运行游戏为例,对本申请实施例的图形渲染方法进行详细的描述。
图14示出了游戏场景下本申请实施例的图形渲染方法的处理过程。图14所示的方法可以由电子设备(或者其他能够呈现游戏画面的电子设备)来执行。
图14所示的过程包括步骤701至707,下面对这些步骤进行详细的介绍。
701、游戏应用调用针对嵌入式系统设计的跨平台的图形程序接口(open graphics library for embedded systems,OPENGL ES),具体地,在游戏运行的过程(游戏画面的绘制过程)中,游戏应用会不断的调用OPENGL ES图形库中的API接口,以绘制游戏所需 的画面进行显示。
702、指令流动态重组(command stream dynamic reconstruction,CSDR)模块缓存当前帧的GLES图形指令和相关数据。
上述步骤702中的相关数据可以包括待进行渲染处理的顶点数据。在游戏运行过程中,调用针对嵌入式系统的图形程序接口(graphics library for embedded systems,GLES)图形指令会被CSDR模块缓存。CPU可以从CSDR模块获取缓存的GLES图形指令和顶点数据进行分析,从而确定是否由CPU进行顶点数据的处理。
703、CPU收集顶点数据以及顶点数据的辅助数据。
在步骤703中,CPU可以从CSDR模块获取顶点数据以及顶点数据的辅助数据,其中,辅助数据包括顶点数据的索引数据以及对顶点数据进行坐标变换的变换矩阵。
应理解,本申请实施例相对于现有方案的改进在于增加了CPU与CSDR模块之间的接口,通过增加CPU与CSDR模块之间的接口,能够使得CPU对顶点数据进行处理,并将CPU处理得到的用户视角范围内的顶点数据重新送入到CSDR中,以便后续GPU能够对该用户时间范围内的顶点数据进行处理。
704、CPU确定当前draw call对应的顶点数据是否进行负载转移。
在步骤704中,draw call对应的顶点数据是绘制一次图形所需要的顶点数据,确定顶点数据是否进行负载转移实质上就是为了确定是否由CPU对顶点数据进行处理(当CPU对顶点数据进行处理时,需要进行负载转移;当CPU不对顶点数据进行处理时,不需要进行负载转移)。
当步骤704中确定不进行负载转移时,顶点数据仍由GPU处理,也就是执行步骤705,而当步骤705中确定需要进行负载转移时,顶点数据交由CPU处理,也就是执行步骤706和707。
705、GPU对顶点数据进行处理。
步骤705中,GPU对顶点数据进行处理的过程可以参见上文中的步骤501、502和503。
706、CPU对顶点数据进行处理,得到用户视角范围的顶点数据。
707、CPU将用户视角范围内的顶点数据送入到GPU中进行渲染处理。
其中,步骤706和707的具体处理过程可参见上文中的步骤102和103的相关内容。
为了更形象的描述出游戏场景下本申请实施例的图形渲染方法的处理过程,下面结合图15对本申请实施例的图形渲染方法在游戏场景下进行描述。
图15示出了游戏场景下本申请实施例的图形渲染方法的处理过程。图15所示的方法可以由电子设备(或者其他能够呈现游戏画面的电子设备)来执行。
图15所示的过程包括步骤801至804,下面分别对这些步骤进行介绍。
801、CPU从GLES指令流中获取顶点数据。
上述GLES指令流中包含进行图形渲染的指令和指令中所携带的参数,其中,参数中包涵了该图形渲染指令对应的顶点数据,因此,CPU可以从该GLES指令流中获取顶点数据。
802、CPU进行预判定,确定是否由CPU对顶点数据进行处理。
CPU进行预判定主要是为了确定CPU是否对这些获取到的顶点数据进行处理,具体的判断过程可以参见上文中102a的相关内容,这里不再详细描述。
当步骤802中CPU确定由GPU对顶点数据进行处理时,CPU不对获取到的顶点数据进行处理,此时,CPU可以继续以获取顶点数据,并在下次获取顶点数据后继续执行步骤802。而在步骤802中,如果CPU确定由CPU来对顶点数据进行处理,那么,CPU继续执行步骤803和804。
803、CPU对顶点数据进行坐标变换、裁剪和剔除,得到用户视角范围内的顶点数据。
步骤803中,CPU得到用户视角范围内的顶点数据的具体过程可以参见上文中步骤102中的相关内容。
804、CPU将用户视角范围内的顶点数据送入到图形程序接口(graphics library,GL)指令组中。
具体地,在步骤804中,CPU在得到用户视角范围内的顶点数据之后,可以将该用户视角范围内的顶点数据送入到GL指令组中,并且替换该GLES指令流中的顶点数据。接下来,再通过GL用户驱动层能够驱动GPU,使得GPU能够获得户视角范围内的顶点数据了,并对该用户视角范围内的顶点数据进行后续的渲染处理。
上文结合图7至15对本申请实施例的图形渲染方法进行了详细的描述,下面结合图16对本申请实施例的图形渲染装置进行详细的介绍。应理解,图16的图形渲染装置能够执行本申请实施例的图形渲染方法的各个步骤,下面在对图16所示的图形渲染装置进行描述时,适当省略重复的描述。
图16是本申请实施例的图形渲染装置的示意性框图。
图16所示的装置1000包括输入/输出接口、存储器和CPU。
其中,存储器用于存储程序,当存储器存储的程序被CPU执行时,该CPU具体用于:
通过输入/输出接口获取待处理顶点数据,该待处理顶点数据是供GPU(该GPU既可以是位于装置1000内,也可以位于其他装置内)进行图形渲染处理的顶点数据;对所述待处理顶点数据进行处理,以得到用户视角范围内的顶点数据;将所述用户视角范围内的顶点数据送入到所述GPU中,以进行图形渲染处理。
本申请中,通过将原来由GPU负责的待处理顶点数据的处理过程转移到CPU中执行,能够减轻GPU进行图形渲染时的负担,可以提高图形渲染的效率。
上述装置1000中还可以包含GPU,装置1000中的CPU能够获取装置1000内原本由GPU处理的待处理顶点数据,并对待处理顶点数据进行处理得到用户视角范围内的顶点数据之后,再将用户视角范围内的顶点数据送入到装置1000中的GPU进行处理。
图17是本申请实施例的电子设备的结构示意图。
应理解,上文中图16所示的装置1000的具体结构可以如图17所示。
图17中的电子设备包括通信模块3010、传感器3020、用户输入模块3030、输出模块3040、处理器3050、存储器3070以及电源3080。其中,处理器3050可以包括一个或者多个CPU。
图17所示的电子设备可以执行本申请实施例的图形渲染方法的各个步骤,具体地,处理器3050中的一个或者多个CPU可以执行本申请实施例的图形渲染方法的各个步骤。
下面对图17中的电子设备的各个模块进行详细的介绍。
通信模块3010可以包括至少一个能使该电子设备与其他电子设备之间进行通信的模块。例如,通信模块3010可以包括有线网络接口、广播接收模块、移动通信模块、无线 因特网模块、局域通信模块和位置(或定位)信息模块等其中的一个或多个。
例如,通信模块3010能够从游戏服务器端实时获取游戏画面。
传感器3020可以感知用户的一些操作,传感器3020可以包括距离传感器,触摸传感器等等。传感器3020可以感知用户触摸屏幕或者靠近屏幕等操作。例如,传感器3020能够感知用户在游戏界面的一些操作。
用户输入模块3030,用于接收输入的数字信息、字符信息或接触式触摸操作/非接触式手势,以及接收与系统的用户设置以及功能控制有关的信号输入等。用户输入模块3030包括触控面板和/或其他输入设备。例如,用户可以通过用户输入模块3030对游戏进行控制。
输出模块3040包括显示面板,用于显示由用户输入的信息、提供给用户的信息或系统的各种菜单界面等。
可选的,可以采用液晶显示器(liquid crystal display,LCD)或有机发光二极管(organic light-emitting diode,OLED)等形式来配置显示面板。在其他一些实施例中,触控面板可覆盖显示面板上,形成触摸显示屏。
另外,输出模块3040还可以包括视频输出模块、告警器以及触觉模块等。该视频输出模块可以显示图形渲染后的游戏画面。
电源3080可以在处理器3050的控制下接收外部电力和内部电力,并且提供整个电子设备各个模块运行时需要的电力。
处理器3050可以包括一个或者多个CPU,处理器3050还可以包括一个或者多个GPU。
当处理器3050包括多个CPU时,该多个CPU可以集成在同一块芯片上,也可以分别集成在不同的芯片上。
当处理器3050包括多个GPU时,该多个GPU既可以集成在同一块芯片上,也可以分别集成在不同的芯片上。
当处理器3050既包括CPU又包括GPU时,CPU和GPU可以集成在同一块芯片上。
例如,当图17所示的电子设备为智能手机时,智能手机的处理器内部一般与图像处理相关的是一个CPU和一个GPU。这里的CPU和GPU均可以包含多个核。
存储器3070可以存储计算机程序,该计算机程序包括操作系统程序3072和应用程序3071等。其中,典型的操作系统如微软公司的Windows,苹果公司的MacOS等用于台式机或笔记本的系统,又如谷歌公司开发的基于
Figure PCTCN2020080582-appb-000002
的安卓
Figure PCTCN2020080582-appb-000003
系统等用于移动终端的系统。
存储器3070可以是以下类型中的一种或多种:闪速(flash)存储器、硬盘类型存储器、微型多媒体卡型存储器、卡式存储器(例如SD或XD存储器)、随机存取存储器(random access memory,RAM)、静态随机存取存储器(static RAM,SRAM)、只读存储器(read only memory,ROM)、电可擦除可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、可编程只读存储器(programmable ROM,PROM)、磁存储器、磁盘或光盘。在其他一些实施例中,存储器3070也可以是因特网上的网络存储设备,系统可以对在因特网上的存储器3070执行更新或读取等操作。
例如,上述存储器3070可以存储一种计算机程序(该计算机程序是本申请实施例的图形渲染方法对应的程序),当处理器3050执行该计算机程序时,处理器3050能够执行 本申请实施例的图形渲染方法。
存储器3070还存储有除计算机程序之外的其他数据3073,例如,存储器3070可以存储本申请的图形渲染方法处理过程中的数据。
图17中各个模块的连接关系仅为一种示例,本申请任意实施例提供的电子设备也可以应用在其它连接方式的电子设备中,例如所有模块通过总线连接。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (21)

  1. 一种图形渲染方法,其特征在于,包括:
    中央处理器CPU获取待处理顶点数据,所述待处理顶点数据是供图形处理器GPU进行图形渲染处理的顶点数据;
    所述CPU对所述待处理顶点数据进行处理,以得到用户视角范围内的顶点数据;
    所述CPU将所述用户视角范围内的顶点数据送入到所述GPU中,以进行图形渲染处理。
  2. 如权利要求1所述的方法,其特征在于,所述CPU获取待处理顶点数据,包括:
    所述CPU从存储模块获取所述待处理顶点数据;
    所述CPU将所述用户视角范围内的顶点数据送入到GPU中,以进行渲染处理,包括:
    所述CPU将所述用户视角范围内的顶点数据存储到所述存储模块,以使得所述GPU从所述存储模块获取所述用户视角范围内的顶点数据,并进行图像渲染处理。
  3. 如权利要求1或2所述的方法,其特征在于,所述待处理顶点数据为用于渲染一帧图像的一次draw call指令抓取的顶点数据。
  4. 如权利要求1-3中任一项所述的方法,其特征在于,在所述CPU对所述待处理顶点数据进行处理之前,所述方法还包括:
    所述CPU根据所述待处理顶点数据、所述CPU的负载量以及所述GPU的负载量中的至少一种确定对所述顶点数据进行处理。
  5. 如权利要求4所述的方法,其特征在于,所述CPU根据所述待处理顶点数据、所述CPU的负载量以及所述GPU的负载量中的至少一种确定对所述待处理顶点数据进行处理,包括:
    在下列至少一种情况发生时,所述CPU确定对所述待处理顶点数据进行处理:
    所述待处理顶点数据的数量大于或者等于第一数量阈值;
    所述CPU的当前负载量小于第一负载量阈值;
    所述GPU的当前负载量大于或者等于第二负载量阈值。
  6. 如权利要求1-5中任一项所述的方法,其特征在于,所述待处理顶点数据为位于本地坐标系的顶点数据,所述CPU对所述待处理顶点数据进行处理,以得到用户视角范围内的顶点数据,包括:
    所述CPU根据辅助数据,对所述位于本地坐标系的顶点数据进行坐标转换,以得到位于裁剪坐标系的顶点数据,其中,所述辅助数据包括对位于所述本地坐标系的顶点数据进行坐标变换的变换矩阵;
    所述CPU对所述位于裁剪坐标系的顶点数据进行剪裁和剔除操作,以得到所述用户视角范围内的顶点数据。
  7. 如权利要求6所述的方法,其特征在于,所述辅助数据包括MVP矩阵,所述CPU根据辅助数据,对所述位于本地坐标系的顶点数据进行坐标转换,以得到位于裁剪坐标系的顶点数据,包括:
    所述CPU根据所述MVP矩阵,对所述位于本地坐标系的顶点数据进行坐标转换,以 得到所述位于裁剪坐标系的顶点数据,其中,所述MVP矩阵是模型矩阵、视图矩阵以及投影矩阵的乘积。
  8. 如权利要求1-7中任一项所述的方法,其特征在于,所述CPU包括M个核,所述CPU对所述待处理顶点数据进行处理,包括:
    在所述待处理顶点数据的数量小于第二数量阈值时,所述CPU将所述待处理顶点数据分配到所述CPU中的单个核中进行处理;
    在所述待处理顶点数据的数量大于或者等于所述第二数量阈值时,所述CPU将所述待处理顶点数据分配到所述CPU中的N个核中进行处理,其中,M和N均为大于1的整数,并且N小于或者等于M。
  9. 如权利要求8所述的方法,其特征在于,所述N个核的当前平均负载量小于N-M个核的当前平均负载量,其中,所述N-M个核为所述CPU中除所述N个核之外的其他核。
  10. 如权利要求8所述的方法,其特征在于,所述CPU将所述待处理顶点数据分配到所述CPU中的单个核中进行处理,包括:
    所述CPU将所述待处理顶点数据分配到所述CPU中当前负载量最小的核中进行处理。
  11. 一种图形渲染装置,其特征在于,包括中央处理器CPU和图形处理器GPU,其中,所述中央处理器CPU用于:
    获取待处理顶点数据,所述待处理顶点数据是供所述GPU进行图形渲染处理的顶点数据;
    对所述待处理顶点数据进行处理,以得到用户视角范围内的顶点数据;
    将所述用户视角范围内的顶点数据送入到所述GPU中,以进行图形渲染处理。
  12. 如权利要求11所述的装置,其特征在于,所述CPU用于:
    从存储模块获取所述待处理顶点数据;
    将所述用户视角范围内的顶点数据存储到所述存储模块,以使得所述GPU从所述存储模块获取所述用户视角范围内的顶点数据,并进行图像渲染处理。
  13. 如权利要求11或12所述的装置,其特征在于,所述待处理顶点数据为用于渲染一帧图像的一次draw call指令抓取的顶点数据。
  14. 如权利要求11-13中任一项所述的装置,其特征在于,所述CPU还用于:
    根据所述待处理顶点数据、所述CPU的负载量以及所述GPU的负载量中的至少一种确定是否对所述顶点数据进行处理。
  15. 如权利要求14所述的装置,其特征在于,所述CPU用于:
    在下列至少一种情况发生时,确定对待处理顶点数据进行处理:
    所述待处理顶点数据的数量大于或者等于第一数量阈值;
    所述CPU的当前负载量小于第一负载量阈值;
    所述GPU的当前负载量大于或者等于第二负载量阈值。
  16. 如权利要求11-15中任一项所述的装置,其特征在于,所述待处理顶点数据为位于本地坐标系的顶点数据,所述CPU用于:
    根据辅助数据,对所述位于本地坐标系的顶点数据进行坐标转换,以得到位于裁剪坐标系的顶点数据,其中,所述辅助数据包括对位于所述本地坐标系的顶点数据进行坐标变 换的变换矩阵;
    对所述位于裁剪坐标系的顶点数据进行剪裁和剔除操作,以得到所述用户视角范围内的顶点数据。
  17. 如权利要求16所述的装置,其特征在于,所述辅助数据包括MVP矩阵,所述CPU用于:
    根据所述MVP矩阵,对所述位于本地坐标系的顶点数据进行坐标转换,以得到所述位于裁剪坐标系的顶点数据,其中,所述MVP矩阵是模型矩阵、视图矩阵以及投影矩阵的乘积。
  18. 如权利要求11-17中任一项所述的装置,其特征在于,所述CPU包括M个核,所述CPU用于:
    在所述待处理顶点数据的数量小于第二数量阈值时,将所述待处理顶点数据分配到所述CPU中的单个核中进行处理;
    在所述待处理顶点数据的数量大于或者等于所述第二数量阈值时,将所述待处理顶点数据分配到所述CPU中的N个核中进行处理,其中,M和N均为大于1的整数,并且N小于或者等于M。
  19. 如权利要求18所述的装置,其特征在于,所述N个核的当前平均负载量小于N-M个核的当前平均负载量,其中,所述N-M个核为所述CPU中除所述N个核之外的其他核。
  20. 如权利要求18所述的装置,其特征在于,所述CPU用于将所述待处理顶点数据分配到所述CPU中的单个核中进行处理,包括:
    将所述待处理顶点数据分配到所述CPU中当前负载量最小的核中进行处理。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有程序代码,所述程序代码包括用于执行如权利要求1-10中的任一项所述的方法的部分或全部步骤的指令。
PCT/CN2020/080582 2019-03-26 2020-03-23 图形渲染方法、装置和计算机可读存储介质 WO2020192608A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20778929.8A EP3937118A4 (en) 2019-03-26 2020-03-23 GRAPHICS RENDERING METHOD AND APPARATUS AND COMPUTER READABLE STORAGE MEDIUM
US17/484,523 US11908039B2 (en) 2019-03-26 2021-09-24 Graphics rendering method and apparatus, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910231774.8 2019-03-26
CN201910231774.8A CN111754381A (zh) 2019-03-26 2019-03-26 图形渲染方法、装置和计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/484,523 Continuation US11908039B2 (en) 2019-03-26 2021-09-24 Graphics rendering method and apparatus, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2020192608A1 true WO2020192608A1 (zh) 2020-10-01

Family

ID=72611036

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080582 WO2020192608A1 (zh) 2019-03-26 2020-03-23 图形渲染方法、装置和计算机可读存储介质

Country Status (4)

Country Link
US (1) US11908039B2 (zh)
EP (1) EP3937118A4 (zh)
CN (1) CN111754381A (zh)
WO (1) WO2020192608A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112086A (zh) * 2023-01-31 2023-11-24 荣耀终端有限公司 一种数据处理方法及电子设备
EP4231242A4 (en) * 2020-10-30 2024-05-15 Huawei Tech Co Ltd GRAPHIC REPRODUCTION METHOD AND ASSOCIATED DEVICE THEREFOR

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113068006B (zh) * 2021-03-16 2023-05-26 珠海研果科技有限公司 图像呈现方法和装置
CN115018692B (zh) * 2021-12-17 2024-03-19 荣耀终端有限公司 一种图像渲染方法及电子设备
CN116777731A (zh) * 2022-03-11 2023-09-19 腾讯科技(成都)有限公司 软光栅化的方法、装置、设备、介质及程序产品
CN115350479B (zh) * 2022-10-21 2023-01-31 腾讯科技(深圳)有限公司 渲染处理方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102147722A (zh) * 2011-04-08 2011-08-10 深圳中微电科技有限公司 实现中央处理器和图形处理器功能的多线程处理器及方法
CN105741228A (zh) * 2016-03-11 2016-07-06 腾讯科技(深圳)有限公司 图形处理方法及装置
CN107223264A (zh) * 2016-12-26 2017-09-29 深圳前海达闼云端智能科技有限公司 一种渲染方法及装置
US20180232915A1 (en) * 2014-07-11 2018-08-16 Autodesk, Inc. Line stylization through graphics processor unit (gpu) textures
CN109509139A (zh) * 2017-09-14 2019-03-22 龙芯中科技术有限公司 顶点数据处理方法、装置及设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6509905B2 (en) * 1998-11-12 2003-01-21 Hewlett-Packard Company Method and apparatus for performing a perspective projection in a graphics device of a computer graphics display system
CN101540056B (zh) * 2009-02-20 2011-05-18 南京师范大学 面向ERDAS Virtual GIS的植入式真三维立体渲染方法
US8797336B2 (en) * 2009-06-30 2014-08-05 Apple Inc. Multi-platform image processing framework
CN101706741B (zh) 2009-12-11 2012-10-24 中国人民解放军国防科学技术大学 一种基于负载平衡的cpu和gpu两级动态任务划分方法
US20140052965A1 (en) 2012-02-08 2014-02-20 Uzi Sarel Dynamic cpu gpu load balancing using power
CN103473814B (zh) * 2013-09-23 2016-01-20 电子科技大学中山学院 一种基于gpu的三维几何图元拾取方法
CN105678680A (zh) * 2015-12-30 2016-06-15 魅族科技(中国)有限公司 一种图像处理的方法和装置
US20180211434A1 (en) * 2017-01-25 2018-07-26 Advanced Micro Devices, Inc. Stereo rendering
CN107464276A (zh) 2017-08-02 2017-12-12 上海湃睿信息科技有限公司 基于Thingworx本地渲染云管理模型的嵌入式Web3D系统及其实现方法
CN108711182A (zh) * 2018-05-03 2018-10-26 广州爱九游信息技术有限公司 渲染处理方法、装置及移动终端设备
US11262839B2 (en) * 2018-05-17 2022-03-01 Sony Interactive Entertainment Inc. Eye tracking with prediction and late update to GPU for fast foveated rendering in an HMD environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102147722A (zh) * 2011-04-08 2011-08-10 深圳中微电科技有限公司 实现中央处理器和图形处理器功能的多线程处理器及方法
US20180232915A1 (en) * 2014-07-11 2018-08-16 Autodesk, Inc. Line stylization through graphics processor unit (gpu) textures
CN105741228A (zh) * 2016-03-11 2016-07-06 腾讯科技(深圳)有限公司 图形处理方法及装置
CN107223264A (zh) * 2016-12-26 2017-09-29 深圳前海达闼云端智能科技有限公司 一种渲染方法及装置
CN109509139A (zh) * 2017-09-14 2019-03-22 龙芯中科技术有限公司 顶点数据处理方法、装置及设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3937118A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4231242A4 (en) * 2020-10-30 2024-05-15 Huawei Tech Co Ltd GRAPHIC REPRODUCTION METHOD AND ASSOCIATED DEVICE THEREFOR
CN117112086A (zh) * 2023-01-31 2023-11-24 荣耀终端有限公司 一种数据处理方法及电子设备

Also Published As

Publication number Publication date
EP3937118A1 (en) 2022-01-12
EP3937118A4 (en) 2022-06-15
US11908039B2 (en) 2024-02-20
CN111754381A (zh) 2020-10-09
US20220012842A1 (en) 2022-01-13

Similar Documents

Publication Publication Date Title
WO2020192608A1 (zh) 图形渲染方法、装置和计算机可读存储介质
US10614549B2 (en) Varying effective resolution by screen location by changing active color sample count within multiple render targets
US10210651B2 (en) Allocation of tiles to processing engines in a graphics processing system
US10402937B2 (en) Multi-GPU frame rendering
US10134102B2 (en) Graphics processing hardware for using compute shaders as front end for vertex shaders
EP3008701B1 (en) Using compute shaders as front end for vertex shaders
US10217183B2 (en) System, method, and computer program product for simultaneous execution of compute and graphics workloads
US9230363B2 (en) System, method, and computer program product for using compression with programmable sample locations
CN116050495A (zh) 用稀疏数据训练神经网络的系统和方法
US9230362B2 (en) System, method, and computer program product for using compression with programmable sample locations
US9659399B2 (en) System, method, and computer program product for passing attribute structures between shader stages in a graphics pipeline
EP3161793B1 (en) Adaptive partition mechanism with arbitrary tile shape for tile based rendering gpu architecture
CN107392836B (zh) 使用图形处理管线实现的立体多投影
US9269179B2 (en) System, method, and computer program product for generating primitive specific attributes
US9811875B2 (en) Texture state cache
KR20190027367A (ko) 멀티 코어 컴퓨팅 아키텍처에서 그래픽 작업부하를 가속화하기 위한 메커니즘
WO2024045416A1 (zh) 一种扩展图块边界的分块渲染模式图形处理方法及系统
EP4231242A1 (en) Graphics rendering method and related device thereof
US9721381B2 (en) System, method, and computer program product for discarding pixel samples
CN113393564B (zh) 利用全局照明数据结构的基于水塘的时空重要性重采样
CN112017101A (zh) 可变光栅化率
US9305388B2 (en) Bit-count texture format
US20150084952A1 (en) System, method, and computer program product for rendering a screen-aligned rectangle primitive
US11302054B2 (en) Varying effective resolution by screen location by changing active color sample count within multiple render targets
WO2023202366A1 (zh) 图形处理器、系统、电子装置、设备及图形处理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20778929

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020778929

Country of ref document: EP

Effective date: 20211008