US20140292773A1 - Virtualization method of vertical-synchronization in graphics systems - Google Patents

Virtualization method of vertical-synchronization in graphics systems Download PDF

Info

Publication number
US20140292773A1
US20140292773A1 US14/302,439 US201414302439A US2014292773A1 US 20140292773 A1 US20140292773 A1 US 20140292773A1 US 201414302439 A US201414302439 A US 201414302439A US 2014292773 A1 US2014292773 A1 US 2014292773A1
Authority
US
United States
Prior art keywords
frame
frames
cpu
display
rendering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/302,439
Inventor
Natalya Segal
Yoel Shoshan
Guy Sela
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Lucidlogix Software Solutions Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lucidlogix Software Solutions Ltd filed Critical Lucidlogix Software Solutions Ltd
Priority to US14/302,439 priority Critical patent/US20140292773A1/en
Assigned to LUCIDLOGIX SOFTWARE SOLUTIONS, LTD. reassignment LUCIDLOGIX SOFTWARE SOLUTIONS, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SELA, GUY, SHOSHAN, YOEL, SEGAL, NATALYA
Publication of US20140292773A1 publication Critical patent/US20140292773A1/en
Assigned to LUCIDLOGIX TECHNOLOGIES LTD. reassignment LUCIDLOGIX TECHNOLOGIES LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LUCIDLOGIX SOFTWARE SOLUTIONS, LTD.
Assigned to GOOGLE LLC reassignment GOOGLE LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LUCIDLOGIX TECHNOLOGY LTD.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/426Internal components of the client ; Characteristics thereof
    • H04N21/42653Internal components of the client ; Characteristics thereof for processing graphics
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/18Timing circuits for raster scan displays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/28Indexing scheme for image data processing or generation, in general involving image processing hardware
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2330/00Aspects of power supply; Aspects of display protection and defect management
    • G09G2330/02Details of power systems and of start or stop of display operation
    • G09G2330/021Power management, e.g. power saving

Abstract

A method for rendering frames in graphic systems includes not displaying at least one frame in a sequence of frames.

Description

    CROSS-REFERENCE TO RELATED CASES
  • This application is a continuation application claiming benefit from U.S. patent application Ser. No. 13/437,869 filed 2 Apr. 2012, which claimed priority from U.S. Provisional Application No. 61/471,154 filed 3 Apr. 2011 and which is hereby incorporated in its entirety by reference.
  • FIELD
  • The present invention relates generally to the field of computer graphics rendering, and more particularly, ways of and means for improving the performance of rendering processes supported on GPU-based 3D graphics platforms associated with diverse types of computing machinery.
  • BACKGROUND
  • Real-time 3D graphics applications such as video games have two contradictory needs. On the one hand there is the requirement for high photorealism; on the other hand a high frame rate is desired. In the video game industry the trend is to push the frame rate up to high FPS rates. However, when this overtakes the screen refresh rate (typically 60 FPS) a tearing artifact occurs, badly affecting the image quality. The higher the frame rate, the worse the tearing effect. Although tearing occurs when the frame feed is not synchronized with the screen refresh rate, it may also occur when FPS is less than the screen refresh rate. However, it is statistically more likely to be seen at higher FPS.
  • Tearing is a visual artifact in video or 3D rendered frames (typically in, but not limited to, 3D games) where information from two or more different frames is shown in a display device simultaneously in a single screen draw. FIG. 1 a shows a series of application-generated back buffer frames with no v-sync, related to a series of displayed frames. In this example the application generates frames at a high rate of 120 FPS, while actual frames are displayed at a lower rate of 60 FPS, limited by the screen refresh rate. The back buffer generated frame (BF) is sent to the display as soon as it is created. When it is delivered in the middle of an ongoing displayed frame, the current scan line is discontinued, while the newly created frame goes on from the discontinued point. If the frame-to-frame data is different, a tearing effect may happen, distorting the image.
  • Tearing can occur with most common display technologies and video cards, and is most noticeable on situations where horizontally-moving visuals are commonly found. FIG. 1 b illustrates tearing artifact in graphics display. This artifact occurs when the frame feed is not synchronized with the screen refresh. The common solution adopted by 3D game developers is v-sync (vertical synchronization), which is an option to synchronize the displayed frame with the screen refresh rate. V-sync is found in most computing systems, wherein the video card is prevented from doing anything visible to the display memory until after the monitor has finished its current refresh cycle.
  • The method of prior art v-sync is illustrated in FIG. 1 c. In graphics display technology the generated image (frame) is stored first on the back buffer, and then by the mechanism of double buffering, it is switched to the front buffer for display on screen. In order to eliminate the tearing effect, a newly generated back buffer frame (BF) is prevented from being displayed on the ongoing front buffer displayed frame (FF). Consequently, the application frame rate is slowed down to the screen refresh rate. When vertical synchronization is in use, the frame rate of the rendering engine will be equal or less than the monitor's refresh rate, if the frame rate was originally higher than the refresh rate. Although this feature normally results in improved video quality, it is not without trade-offs in some cases. First, vertical synchronization is known to cause input lag, which is most noticeable when playing video games. Second, when one wishes to benchmark a video card or rendering engine, it is generally implied that the hardware and software render the display as fast as possible, with no regard to the monitor's capabilities or the resultant video tearing. Otherwise, the monitor and video card will throttle the benchmarking program, causing it to generate invalid results.
  • Video games, which have a wide variety of rendering engines, tend to benefit well from vertical synchronization, as the rendering engine is normally expected to build each frame in real time, based on whatever the engine's variables specify at the moment a frame is requested. However, because vertical synchronization causes input lag, it can interfere with games which require precise timing or fast reaction times. 3D CAD applications benefit as well from vertical synchronization. These applications are known for their slower frame rate due to large amounts of data. Their tearing effect is typically caused by the screen refresh mechanism, unsynchronized with the slower displayed frames.
  • A graphics system without v-sync has the best responsiveness, as demonstrated in FIG. 1 d with two extreme user input cases. The response is between 0.5 and 1 display frames. In input case 1, which is the worst case delay, the response is one display frame, whereas in input case 2, the best case delay, the delay is 0.5 display frame. The v-sync input lag occurs due to the blocked generation of back buffer frames, as shown in FIG. 1 e. The back buffer generated frame (BF) enters a waiting state until the screen becomes available, completing the currently displayed frame. The worst case is shown with user input 1, which comes at the beginning of displayed frame 1, and affects the display in displayed frame 3, causing a lag of 2 frames. The best case is exemplified on user input 2, initiated just before the start of displayed frame 3, and affects the image in display frame 4, causing a single frame lag. Single frame lag is considered normal.
  • Therefore, the v-sync of prior art solves the tearing artifacts, however it suffers from two major drawbacks: (i) performance penalties binding FPS to the screen refresh rate, and (ii) input lag that reduces the application's responsiveness. These two shortfalls are critical in real-time graphics applications.
  • SUMMARY
  • Vertical synchronization (v-sync) in prior art prevents video tearing artifacts by keeping the frame rate of the rendering engine equal to the monitor's refresh rate, if the frame rate originally tends to be lower or higher. However, this technique suffers from two substantial shortcomings: performance limitation and input lag, both of which are critical drawbacks in real-time applications such as video games.
  • The virtual vertical-synchronization (Virtual V-sync) of the present invention removes the performance shortfall by virtually allowing any frame-per-second rate, independent of the monitor refresh rate, and eliminates the input lag by removing frame blocking. The method is based on preventing excessive application-generated frames from being displayed; instead, the unpresented frames are dropped, or shortened first and then dropped. In order to eliminate artifacts caused by missing frames, inter-frame dependency is resolved.
  • The virtual vertical-synchronization method of some embodiments of the invention can work with any off-the-shelf GPU and computing system, independently of GPU make, model or size. The virtual vertical-synchronization of the present invention is the basis for two additional aspects of the invention: power consumption control of graphics systems and improved GPU utilization in cloud-based real-time graphics applications, such as cloud gaming.
  • There is provided, in accordance with an embodiment of the present invention, a method for rendering frames in graphic systems including not displaying at least one frame in a sequence of frames.
  • According to an embodiment of the present invention, the method further includes determining an amount of time required to finish displaying a rendered frame being currently displayed in the sequence of frames.
  • According to an embodiment of the present invention, the method further includes determining an amount of time required to render the at least one frame.
  • According to an embodiment of the present invention, the method further includes not displaying the at least one frame when a time difference between the amount of time required to render the at least one frame and the amount of time required to finish displaying the rendered frame being currently displayed exceeds a predetermined time.
  • According to an embodiment of the present invention, the method further includes evaluating inter-frame dependency between the at least one frame and a successive one or more frames in the sequence of frames.
  • According to an embodiment of the present invention, the method further includes shortening the at least one frame if inter-frame dependency exists with the successive one or more frames.
  • According to an embodiment of the present invention, the shortening includes removing from the at least one frame some rendering commands.
  • According to an embodiment of the present invention, the method further includes shortening the at least one frame prior to the not displaying.
  • According to an embodiment of the present invention, the not displaying includes removing from the at least one frame all rendering commands.
  • According to an embodiment of the present invention, the not displaying includes discarding the at least one frame following its rendering.
  • There is provided, in accordance with an embodiment of the present invention, a computing device including a CPU (central processing unit) to manage rendering of frames associated with graphics context and to issue an instruction to not display at least one frame in a sequence of frames; and a GPU (graphics processing unit) to render frames in the sequence of frames.
  • According to an embodiment of the present invention, the CPU determines an amount of time required to finish displaying a rendered frame being currently displayed.
  • According to an embodiment of the present invention, the CPU further determines an amount of time required to render the at least one frame.
  • According to an embodiment of the present invention, the CPU issues the instruction to not display responsive to a time difference between the amount of time required to render the at least one frame and the amount of time required to finish displaying the rendered frame being currently displayed exceeding a predetermined time.
  • According to an embodiment of the present invention, the CPU evaluates inter-frame dependency between the at least one frame and one or more successive frames in said sequence of frames.
  • According to an embodiment of the present invention, the CPU shortens the at least one frame if inter-frame dependency exists with the one or more successive frames.
  • According to an embodiment of the present invention, the shortening includes the CPU removing from the at least one frame some rendering commands.
  • According to an embodiment of the present invention, the CPU shortens the at least one frame prior to issuing the instruction to not display.
  • According to an embodiment of the present invention, the instruction to not display includes the CPU removing from the at least one frame all rendering commands.
  • According to an embodiment of the present invention, the instruction to not display includes the CPU discarding the at least one frame following its rendering by the GPU.
  • BRIEF DESCRIPTION OF DRAWINGS
  • For a more complete understanding of practical applications of the embodiments of the present invention, the following detailed description of the illustrative embodiments can be read in conjunction with the accompanying drawings, briefly described below:
  • FIG. 1A. Prior art. A series of application-generated back buffer with no v-sync.
  • FIG. 1 b. Prior art. The effect of tearing.
  • FIG. 1 c. Prior art. The method of v-sync.
  • FIG. 1 d. Prior art. Responsiveness of graphics system without v-sync mechanism.
  • FIG. 1 e. Prior art. Deteriorated responsiveness to user input, due to frame blocking by the application.
  • FIG. 2 a. Flowchart of the ‘basic’ mode of Virtual Vsync.
  • FIG. 2 b. Frame sequence of Virtual Vsync.
  • FIG. 3 a. Flowchart of the ‘hybrid’ mode of Virtual Vsync.
  • FIG. 3 b. Frame sequence of the hybrid mode of the Virtual Vsync.
  • FIG. 4 a. Flowchart of the ‘concise’ mode of Virtual Vsync.
  • FIG. 4 b. The ‘concise’ mode of Virtual Vsync.
  • FIG. 4 c. The principle of inter-frame dependency.
  • FIG. 4 d. Various cases of inter-frame dependency.
  • FIG. 5 a. Responsiveness of the basic mode of the Virtual Vsync method.
  • FIG. 5 b. Comparison chart of responsiveness: prior art's standard v-sync vs. one embodiment of present invention's Virtual Vsync.
  • FIG. 6. Responsiveness of the ‘concise’ mode.
  • FIG. 7. Stuttering and its solution.
  • FIG. 8. Frame sequence of the cloud mode of Virtual Vsync.
  • FIG. 9. Comparison of power consumption between native mode and concise mode
  • FIG. 10 a. Implementation on a discrete GPU system.
  • FIG. 10 b. Implementation on a dual GPU system.
  • DETAILED DESCRIPTION Modes of Virtual Vertical-Synchronization
  • The virtual vertical-synchronization (Virtual V-sync) of the different embodiments of present invention removes performance shortfalls by virtually allowing any high rate of frame-per-second, independent of the monitor refresh rate, and eliminates the input lag by removing the frame blocking mechanism. The term “monitor refresh rate” is the number of times in a second that display hardware draws the data. This is distinct from the measure of “application frame rate” of how often the application driving the graphics system can feed an entire frame of new data to a display. In case of an application frame rate that is higher than refresh rate, the “actual frame rate” of the graphics system is that of monitor refresh rate. In embodiment of present invention the excessive application frames, above the refresh rate, are assigned as “to-be-dropped” frames. These frames are dropped without rendering, or rendered only partly in case of inter-frame dependency, as explained hereinafter. “Frame blocking” refers to keeping the rendered frame on hold, until the display hardware completes displaying the previous frame. Frame blocking causes input lags, deteriorating graphics system responsiveness.
  • There are three embodiments of the present invention; (i) the basic mode in which the subsequent frame is generated by the application, then at the time of display it is displayed or dropped, depending on screen availability. (ii) the hybrid mode, where the subsequent frame is generated, but its display depends on the time remaining for the currently displayed frame, and (iii) the concise-frame mode where the time remaining for the currently displayed frame is assessed in advance, and the immediate drop of a fully generated frame is replaced by creating a concise frame with reduced number of draw calls, which is then dropped. In the following description the term BF relates to subsequent back buffer generated frames, and FF stands for front buffer frames displayed at a restricted refresh rate.
  • FIG. 2 a shows a flowchart of the Virtual V-sync basic mode. In this mode all the BF are unconditionally generated, as if all are going to be displayed. A frame, when completed, is either sent to display or dropped without being displayed, depending on screen availability. No frames are blocked by the application. Tearing is eliminated because the undropped frames are never presented to display in the middle of the current FF; they always start a new screen scan upon termination of previous one, beginning from starting point on the screen. Consequently, the FPS performance is high and at the level of a non-v-sync unlimited frame rate, but without the tearing artifacts. This is clearly illustrated in FIG. 2 b, in the case of an application rendering rate of 120 FPS, while the screen refresh rate is 60 FPS.
  • The hybrid mode, based on controlled frame blocking, allows higher FPS than the prior art v-sync. It is flowcharted in FIG. 3 a. The subsequent frame (BF) is unconditionally generated, but its display depends on the required blocking period. If the screen is available upon completion of BF, the back buffer is sent for display, otherwise the time remaining for the currently displayed FF is assessed. If the time remaining is below a given threshold, the next BF is blocked until the current BF goes to display. If the time remaining is above the given threshold, the BF is dropped and a new BF starts. In this way the blocking stage is controlled, allowing higher FPS than the prior art v-sync. FIG. 3 b shows the relation between the BF and FF sequences. For example, the generation of BF 4 is blocked until BF 3 is presented to display. Then, after completion of BF 4, the time remaining for its display is determined to be too long, above the threshold, and consequently BF 4 is dropped. BF 5 starts right away without blocking, and is switched to the front buffer for display immediately upon completion.
  • The concise frame mode is based on shortening before dropping or dropping without shortening the undisplayed frames, allowing higher FPS. The screen availability upon completion of BF is assessed in advance. As shown in the flowchart of FIG. 4 a, the BF is generated in its entirety if it has a chance to be displayed, otherwise it is shortened by turning redundant tasks that are not required for subsequent frames, i.e. no inter-frame dependency, to non-operational. Frame dependency is a critical issue in this context, and will be discussed hereinafter. For each newly started BF a timing assessment is done. If the time remaining for the currently displayed FF is too long, in event of inter-frame dependency the BF is generated with a reduced number of draw calls, creating a concise frame, and dropped, or in event of no inter-frame dependency the frame in its entirety is dropped. If a timing match for display is positively assessed, then a full BF is generated and displayed, if screen is available. Otherwise it keeps waiting. There is no frame blocking in this mode. The actual implementation on Graphic Pipeline is that you can send the “Present” command and continue to the next frame. The present will be queued in the pipeline and won't be blocked. The actual blocking happen usually a few draw-calls after “present”, when the actual pipeline is full. So, there is no actual restriction to execute the “present” only if you can guarantee that the display will take the BF and Switch to FB almost immediately after the “present” was sent. FIG. 4 b shows the relation between the BF and FF sequences. For example, the BF 5 and 6 are created as concise frames and dropped, BF 7 on the other hand is fully generated, including draw calls, and sent to display.
  • Resolving Inter-Frame Dependency
  • A frame becomes subject to inter-frame dependency if a graphics entity (e.g. texture resource) created as part of the frame, by means of a render target task (herein termed shortly as ‘task’), evoked by a draw call, becomes a source of reference to successive frames. Inter-frame dependency is illustrated in FIG. 4 c. Task in the first frame creates a render target for repeated use as a texture in successive frames by task in frame k+1 and by taskk in frame k+2. If task, is purged as part of the reduction of frame k into a concise frame, this texture resource will be missing in subsequent frames, causing an artifact. For example, an image of a mountain reflected in a fishpond is created only once per multiple frames, but this image is incorporated in all consecutive frames. The reflected mountain image is stored as an intermediate render target (RT). This RT becomes an input resource, a texture, for succeeding frames. If that draw call is dropped from the referenced frame, the image of the reflected mountain disappears from successive frames as well, causing an artifact.
  • A frame can be seen as a flow of tasks (T1-T2-T3- . . . TN), when each task has its input and output resources: vertex buffer (VB), index buffer (IB), texture, render target (RT), shaders, and states. An output B of task Tk at frame N is used as an input to task T1 of frame N+1. If that input B is missing, the result is an artifact. For that reason, at the time of formation, inter-frame dependency between tasks must be revealed and solved in order to prevent artifacts.
  • Practically speaking, there are two different methods to deal with the inter-frame dependency issue. The simple one is a “per application” method based on an individual investigation of each application, making a list of all resources that ought to be provided by one of the preceding frames. The tasks that generate those resources shouldn't be dropped. However, this is a customization method; it is manual and expensive. It requires a human learning curve for each application. Consequently, an automatic method for solving inter-frame dependency is needed.
  • In one embodiment of the present invention the automatic method for solving inter-frame dependency is based on a Dependency Handler software module, responsible for preventing artifacts caused by frame dependency. For every resource, the module must identify the updating task. Whenever a dependency exists, it must make sure that the successive frames received all the required resources. This is done by keeping the updating task as part of the concise frame, while other draw calls can be removed. The resource is then generated, and from this point on the resource becomes available to all successive frames.
  • FIG. 4 d shows different cases of inter-frame dependency. Resources of successive frames are shown. A resource in frame 1 is set by the command Set Render Target. In frame 2 this resource is called up by the command Set Texture. It is essential to verify in frame 2 whether the called up resource is dependent on the previous frame or not. Case 1 is a simple example of dependency when the final result in frame 2 depends on the drawn element in the preceding frame. In case 1 a small rectangle was created by the first frame. In the next frame the dependency disappears only if the rectangle is completely overdrawn. In case 1 the original rectangle appears in the final result as well, which makes the second frame dependent on the first one. In case 2 the triangle overwrites the rectangle, removing the dependency.
  • The difficulty stems from the need to recognize in real time whether the overwriting was complete or not. In case 3 the answer is made simple because of overwriting by a full square quad or in case 6 a Clear command, removing any chance for dependency. In case 4 the full squad is assembled from a puzzle of smaller polygons, which raises uncertainty. If the polygons fully cover the texture, no dependency exists. The occlusion query command, counting the number of drawn pixels, can help. However, if the texture is not completely covered, the dependency is questionable: both options still exist. Case 5 shows an example of incomplete overdraw, leaving the dependency in place. In the case of uncertainty, we need to take a “false positive” approach, meaning that we must assume dependency, in order to eliminate any chance of artifacts.
  • Responsiveness
  • Some embodiments of the present invention minimize input lags to the level of graphics systems without v-sync solutions. As mentioned before, input lags deteriorate the responsiveness of real-time graphics systems, interfering with games which require precise timing or fast reaction times. The high responsiveness of the Virtual v-sync method of the embodiment of the present invention is illustrated in FIG. 5 a. The basic mode is analyzed, in terms of worse and best case. Worse case is shown on user input 1, which comes at the beginning of BF 2, therefore it can be reflected only in BF 3. However, BF 3 is dropped, therefore its response is shown only on BF 4 which is displayed as FF 4. The lag is 1.5 frames. The best case is exemplified on user input 2, initiated just before the start of BF 6, and coming into effect at the end of BF 6, therefore visible by display frame 6, causing a delay of only 0.5 display frame, equal to the best case of non-v-sync graphics systems (see FIG. 1 d). It is significantly better than that of the prior art's v-sync (FIG. 1 e) in which the worst-case lag is 2 display frames, and the best case is 1 display frame. The responsiveness comparison (FIG. 5 b) between the prior art's v-sync method and the Virtual v-sync of the embodiment of the present invention is based on real benchmarking. Testing was done on video game applications. FPS (frames per second) reflects responsiveness. Tests performed and reported in FIG. 5 b indicate improvements on the order of 100% to 250%, but these results are representative only, and actual improvement may be less than 100% or greater than 250%. The concise mode embodiment of present invention is even more responsive due to shortening BFs while dropping draw calls, as shown in FIG. 6. In the worst case the delay is of a single display frame, whereas in the best case it is only a fraction of a display frame, depending on the difference between FPS and screen refresh rate.
  • An additional way to improving responsiveness in some embodiments is by shortening the queue of driver-formed frames in the CPU. The frames are queued prior being sent to the GPU. The typical queue length in a CPU is of three frames, with no blocking, causing a constant input lag. This lag can be shortened by decreasing the queue to one or two driver-formed frames.
  • In summary, the different embodiments prevent video tearing artifacts, performance limitations and input lag in graphics systems, all of which are critical in real-time applications.
  • Eliminating Micro Stuttering
  • Micro stuttering is inherent in every technique of dropping frames in a non-uniform way. Typically, micro stuttering is a term used in computing to describe a quality defect inherent in multi-GPU configurations, using time division (alternate frame rate). It manifests as irregular delays between frames rendered by the multiple GPUs. This effect may be apparent when the flow of frames appears to stutter, resulting in a degraded game play experience in video games, even though the frame rate seems high enough to provide a smooth experience.
  • In different embodiments, when the shortening and dropping frames are practiced, a micro stuttering may appear. It causes two deteriorating effects: (i) a non fluent image (stuttering image) when the animated contents do not develop smoothly, and (ii) a non-uniform pace of displaying frames (stuttering display). The stuttering of an image stems from the discrepancy caused to the virtual timeline at the animated application by missing frames from the timely sequence. The virtual time must then be compensated accordingly, to eliminate image stuttering.
  • FIG. 7 shows a stuttering case, including stuttering image and stuttering display. The way to fix the stuttering effect is shown as well. The original sequence of frames is shown in row 80. There are 4 frames with a time-sensitive content. Four frames are shown, submitted to the GPU at I1-I4 times, and displayed by the GPU due to “present” commands, P1-P4. In row 81 a concise mode is shown, having dropped the third frame. This drop would result in stuttering. After the drop, only the frames 1, 2, and 4 remain, with presents P1, P2, and P4 respectively. The present command P3 is missing, resulting in stuttering display, due to non-uniformly spaced present commands. The remedy comes from changing the times for the frames 2 and 4. The duration time of frames 2 and 4 compensate for the missing frame. Both frames are being appended with an additional time of ΔT. As shown, the presenting time of frame 2 is shifted from P2 to P′2, delayed by T+ΔT, while ΔT in this example equals T/2. Frame 2 submitted by the application (CPU) to GPU at the original time I2, but presented with a ΔT delay at P′2, would be incorrect at the time of display, causing a stuttering image. The resulting image would be incorrect at the time of present. To fix this, the application should send a frame for rendering on time, according to its internal clock; frame 2 must be submitted to the GPU at the new time I′2 and sent to display at P′2, as shown in FIG. 7.
  • In summary, in order to prevent stuttering of display as well as of image, the application clock must be controlled by timely submissions of frames to GPU, and timely presents to display. Same method should be applied for mouse and keyboard movements. Mouse and Keyboard movements should be manipulated to fit the actual presented frames in the same way as the applications clock was controlled.
  • Cloud Gaming
  • Another embodiment of the present invention matches the cloud gaming application. Cloud gaming is a type of online gaming that allows direct and on-demand streaming of games onto a computer through the use of a thin client, in which the actual game is stored on the operator's or game company's server and is streamed directly to computers accessing the server through the client. This makes the capability of the user's computer unimportant, as the server is handling the processing needs. The controls and button presses from the user are transmitted directly to the server, where they are recorded, and the server then sends back the game's response to the input controls.
  • High utilization of the GPU in cloud gaming is of significant importance. The more applications a GPU can run simultaneously, the higher its utilization. It is gained by usage of the concise mode along with the solution of inter-frame dependency, as described above. FIG. 8 depicts the cloud mode of Virtual Vsync. In the given example a single GPU generates simultaneously three independent streams of frames to three remote clients. Two different types of frames are generated: a full displayable frame (70), and a shortened frame with dropped draw calls (71). By cutting down frames, entirely or partly, without raising the frame rate, more applications can simultaneously run on a GPU, increasing its utilization.
  • Power Consumption Control
  • The graphics subsystem of a computing system is typically the largest power consumer. The power dissipated by a graphics subsystem is proportional to the frame rate: P=C*FPS, where P is the dissipated power and C is the heat capacitance. As FPS changes, the power follows the change in a linear way. Lowering FPS decreases the power consumption. Unfortunately, this decrease in power consumption comes at the price of derogated responsiveness, due to a slower FPS. For that reason a real-time power-performance tradeoff must be kept. The capability of controlled FPS suggests a dynamic way of doing this: the dynamic FPS scaling mechanism, whereby the FPS of a graphics subsystem can be automatically adjusted “on the fly,” either lowered to conserve power and reduce the amount of heat generated at the cost of responsiveness, or increased to improve the responsiveness. Such a dynamic FPS scaling would be important in laptops, tablets and other mobile devices, where energy comes from a battery and thus is limited. It can also be used in quiet computing settings that need low noise levels, such as video editing, sound mixing, home servers, and home theater PCs. A typical quiet PC uses quiet cooling and storage devices and energy-efficient parts. Less heat output, in turn, allows the system cooling fans to be throttled down or turned off, reducing noise levels and further decreasing power consumption.
  • In some embodiments of the present invention, the capability of altering the FPS is applied to controlling the power consumption of the system. In concise mode the FPS is raised by dropping some frames or cutting parts thereof. As a result, when at a given FPS, the GPU power consumption in concise mode is compared with the GPU power consumption in native mode; the consumption at concise mode is apparently lower, saving power. This is evident from the table of FIG. 9, for the graphics application of the video game Call of Duty 4. The frame rate of concise mode grows from 138 FPS to 310 FPS (growth of over 124%), resulting in GPU power reduction from 84.6 W to 62.4 W, or over 26%. Such a reduced GPU power should be significant for the overall computing system, as the GPU typically is the main power consumer.
  • Unfortunately, the total power consumption does not drop in the same ratio, because of the second largest power consumer in the system, the CPU. Following the increased FPS, the CPU needs to work harder, preparing more frames per time unit for the GPU, resulting in intensified power consumption. This is evident from FIG. 9; the CPU power increases from 19.19 W in native mode to 25.7 W in concise mode, a growth of 31.7%. On the whole, the power gain of the CPU is balanced by the power loss of GPU, and the resulting power drop is only 15.6%.
  • The way to save the power gain of the GPU is by artificially reducing the power consumption of the CPU, without interfering with the CPU's work on behalf of graphics. Usually, each frame processed by the GPU has to be pre-processed by the CPU, transferred to the GPU for rendering, and finally sent from the GPU to display by the Present call. The frame rendering period at the GPU overlaps with the CPU pre-processing of the successive frame. Typically, the pre-processing time at the CPU is shorter, terminating at some time before the present call, resulting in a CPU idle period. According to an embodiment of present invention the CPU is shut down during that idle period, by an issued Sleep(X MS) command (also called CPU bubbles). This is shown in FIG. 9, in the “Concise, sleep (3 ms)” row. In the given example, at all frames the CPU was sent to sleep for 3 msec. before the present command. As a result, the saving of CPU power was improved dramatically; its power consumption dropped over {tilde over ( )}28% below the native mode, and {tilde over ( )}50% below the concise no-sleep mode. Simultaneously the frame rate lowered to 165 FPS, still better than native mode. The aggregated power savings of GPU and CPU, is −29% of the native power.
  • Implementation
  • The preferred embodiment of Virtual V-sync of the present invention comprises GPU-related graphics contexts, and CPU-related tasks to manage the graphics contexts. There are two graphics contexts:
      • (i) The Rendering Context, for rendering the input data and storing the resulting frame image in the back buffer, and
      • (ii) The Display Context, for transferring the back buffer to the display device, while the transfer is synchronized with display refresh rate.
  • The Rendering Context is managed by a series of CPU tasks: (i) decision making on dropping frames or shortening frames (ii) testing inter-frame dependencies, (iii) modifying frames accordingly, (iv) feeding the GPU with data and commands, and (v) transferring the final back buffers to presenting frames. A series of tasks are required to manage the Display Context: (i) receive rendered frames from the rendering context. (ii) Managing the back buffers swap chain, and (iii) controlling the Display Sync.
  • FIGS. 10 a and 10 b demonstrate two preferred system embodiments of the present invention, based on off-the-shelf components, such as multicore chips, CPU and GPU fusion chips, discrete GPUs, etc. FIG. 10 a illustrates a graphics system comprising CPU, discrete GPU and Display. The display is connected to the GPU. Both graphic contents run on a single GPU, managed by two CPU threads. Rendering always is the primary context on a GPU, while rendering performance is of main concern in real-time graphics applications. However, in this embodiment the GPU is underutilized in regard to rendering, due to the time spent on the Display Context. FIG. 10 b illustrates a more efficient, dual GPU system: a hybrid chip having at least one CPU and an integrated GPU, and a separate discrete GPU. The display connects to the integrated GPU. The discrete GPU runs the Rendering Context, undisturbed by the Display Context, which runs on the integrated GPU. Both contexts are managed by two CPU threads.

Claims (20)

What is claimed is:
1. A method for rendering frames in graphic systems comprising not displaying at least one frame in a sequence of frames.
2. A method according to claim 1 further comprising determining an amount of time required to finish displaying a rendered frame being currently displayed in said sequence of frames.
3. A method according to claim 2 further comprising determining an amount of time required to render the at least one frame.
4. A method according to claim 3 further comprising not displaying the at least one frame when a time difference between the amount of time required to render the at least one frame and the amount of time required to finish displaying the rendered frame being currently displayed exceeds a predetermined time.
5. A method according to claim 1 further comprising evaluating inter-frame dependency between the at least one frame and a successive one or more frames in said sequence of frames.
6. A method according to claim 5 further comprising shortening the at least one frame if inter-frame dependency exists with the successive one or more frames.
7. A method according to claim 6 wherein said shortening comprises removing from said at least one frame some rendering commands.
8. A method according to claim 7 further comprising shortening the at least one frame prior to said not displaying.
9. A method according to claim 1 wherein said not displaying comprises removing from said at least one frame all rendering commands.
10. A method according to claim 1 wherein said not displaying comprises discarding said at least one frame following its rendering.
11. A computing device comprising:
a CPU (central processing unit) to manage rendering of frames associated with graphics context and to issue an instruction to not display at least one frame in a sequence of frames; and
a GPU (graphics processing unit) to render frames in said sequence of frames.
12. A computing device according to claim 11 wherein said CPU determines an amount of time required to finish displaying a rendered frame being currently displayed.
13. A computing device according to claim 12 wherein said CPU further determines an amount of time required to render said at least one frame.
14. A computing device according to claim 13 wherein said CPU issues said instruction to not display responsive to a time difference between the amount of time required to render said at least one frame and the amount of time required to finish displaying the rendered frame being currently displayed exceeding a predetermined time.
15. A computing device according to claim 11 wherein said CPU evaluates inter-frame dependency between said at least one frame and one or more successive frames in said sequence of frames.
16. A computing device according to claim 15 wherein said CPU shortens said at least one frame if inter-frame dependency exists with said one or more successive frames.
17. A computing device according to claim 16 wherein said shortening comprises said CPU removing from said at least one frame some rendering commands.
18. A computing device according to claim 17 wherein said CPU shortens said at least one frame prior to issuing said instruction to not display.
19. A computing device according to claim 11 wherein said instruction to not display comprises said CPU removing from said at least one frame all rendering commands.
20. A computing device according to claim 11 wherein said instruction to not display comprises said CPU discarding said at least one frame following its rendering by said GPU.
US14/302,439 2011-04-03 2014-06-12 Virtualization method of vertical-synchronization in graphics systems Abandoned US20140292773A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/302,439 US20140292773A1 (en) 2011-04-03 2014-06-12 Virtualization method of vertical-synchronization in graphics systems

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201161471154P 2011-04-03 2011-04-03
US13/437,869 US8754904B2 (en) 2011-04-03 2012-04-02 Virtualization method of vertical-synchronization in graphics systems
US14/302,439 US20140292773A1 (en) 2011-04-03 2014-06-12 Virtualization method of vertical-synchronization in graphics systems

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/437,869 Continuation US8754904B2 (en) 2008-08-20 2012-04-02 Virtualization method of vertical-synchronization in graphics systems

Publications (1)

Publication Number Publication Date
US20140292773A1 true US20140292773A1 (en) 2014-10-02

Family

ID=47006083

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/437,869 Active 2032-12-07 US8754904B2 (en) 2008-08-20 2012-04-02 Virtualization method of vertical-synchronization in graphics systems
US14/302,441 Abandoned US20140292785A1 (en) 2011-04-03 2014-06-12 Virtualization method of vertical-synchronization in graphics systems
US14/302,439 Abandoned US20140292773A1 (en) 2011-04-03 2014-06-12 Virtualization method of vertical-synchronization in graphics systems

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US13/437,869 Active 2032-12-07 US8754904B2 (en) 2008-08-20 2012-04-02 Virtualization method of vertical-synchronization in graphics systems
US14/302,441 Abandoned US20140292785A1 (en) 2011-04-03 2014-06-12 Virtualization method of vertical-synchronization in graphics systems

Country Status (1)

Country Link
US (3) US8754904B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9728166B2 (en) * 2015-08-20 2017-08-08 Qualcomm Incorporated Refresh rate matching with predictive time-shift compensation
WO2019001077A1 (en) * 2017-06-30 2019-01-03 武汉斗鱼网络科技有限公司 Method and apparatus for controlling synchronization of cpu threads and gpu threads

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9342181B2 (en) 2012-01-09 2016-05-17 Nvidia Corporation Touch-screen input/output device touch sensing techniques
US11406906B2 (en) * 2012-03-13 2022-08-09 Sony Interactive Entertainment LLC Network connected controller for direct to cloud gaming
US9345966B2 (en) 2012-03-13 2016-05-24 Sony Interactive Entertainment America Llc Sharing recorded gameplay to a social graph
US10913003B2 (en) * 2012-03-13 2021-02-09 Sony Interactive Entertainment LLC Mini-games accessed through a sharing interface
US9823935B2 (en) * 2012-07-26 2017-11-21 Nvidia Corporation Techniques for latching input events to display flips
US8990446B2 (en) * 2012-10-04 2015-03-24 Sony Computer Entertainment America, LLC Method and apparatus for decreasing presentation latency
US10062142B2 (en) * 2012-12-31 2018-08-28 Nvidia Corporation Stutter buffer transfer techniques for display systems
US10141930B2 (en) 2013-06-04 2018-11-27 Nvidia Corporation Three state latch
JP5411386B1 (en) * 2013-08-12 2014-02-12 株式会社 ディー・エヌ・エー Server and method for providing game
US9823728B2 (en) 2013-09-04 2017-11-21 Nvidia Corporation Method and system for reduced rate touch scanning on an electronic device
US9881592B2 (en) 2013-10-08 2018-01-30 Nvidia Corporation Hardware overlay assignment
CN103593155B (en) * 2013-11-06 2016-09-07 华为终端有限公司 Display frame generating method and terminal device
US9507470B2 (en) 2013-12-16 2016-11-29 Nvidia Corporation Method and system for reduced power touch input detection on an electronic device using reduced scanning
KR102312681B1 (en) 2015-03-17 2021-10-13 한화테크윈 주식회사 System and Method of processing image
US9811388B2 (en) * 2015-05-14 2017-11-07 Qualcomm Innovation Center, Inc. VSync aligned CPU frequency governor sampling
KR102507114B1 (en) * 2016-06-08 2023-03-07 삼성전자주식회사 method and apparatus for providing composition screen by composing the execution window of a plurality of the operating system
KR102606693B1 (en) 2016-08-23 2023-11-28 삼성전자 주식회사 Electronic device and method for controlling operation thereof
CN107818069B (en) 2016-09-12 2021-10-01 阿里巴巴集团控股有限公司 Data processing method and system
CN106791212B (en) * 2017-03-10 2019-07-02 Oppo广东移动通信有限公司 A kind of control method, device and the mobile terminal of mobile terminal refresh rate
US10679314B2 (en) 2017-03-15 2020-06-09 Microsoft Technology Licensing, Llc Techniques for reducing perceptible delay in rendering graphics
CN109391847B (en) * 2017-08-08 2021-10-12 中国电信股份有限公司 Monitoring method and monitoring device for blocking of streaming media
CN109474768A (en) * 2017-09-08 2019-03-15 中兴通讯股份有限公司 A kind of method and device improving image fluency
KR102424794B1 (en) 2017-10-24 2022-07-25 삼성전자주식회사 Electronic device dispaying an image and operation method of thereof
EP3503569A1 (en) * 2017-12-19 2019-06-26 Thomson Licensing Method of transmitting video frames from a video stream to a display and corresponding apparatus
WO2020062069A1 (en) * 2018-09-28 2020-04-02 Qualcomm Incorporated Frame composition alignment to target frame rate for janks reduction
US10771580B1 (en) * 2019-03-14 2020-09-08 Dell Products L.P. Using machine learning to improve input/output performance of an application
CN110806909A (en) * 2019-11-01 2020-02-18 北京金山安全软件有限公司 Method and device for determining page frame dropping information of application program and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060146185A1 (en) * 2005-01-05 2006-07-06 Microsoft Corporation Software-based audio rendering
US20080012985A1 (en) * 2006-07-12 2008-01-17 Quanta Computer Inc. System and method for synchronizing video frames and audio frames
US20100020088A1 (en) * 2007-02-28 2010-01-28 Panasonic Corporation Graphics rendering device and graphics rendering method
US20110110420A1 (en) * 2009-11-06 2011-05-12 Qualcomm Incorporated Control of video encoding based on image capture parameter

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU657510B2 (en) * 1991-05-24 1995-03-16 Apple Inc. Improved image encoding/decoding method and apparatus
US7698575B2 (en) * 2004-03-30 2010-04-13 Intel Corporation Managing power consumption by requesting an adjustment to an operating point of a processor
US20060007199A1 (en) * 2004-06-29 2006-01-12 Gilbert John D Apparatus and method for light signal processing utilizing sub-frame switching
US7397478B2 (en) * 2005-09-29 2008-07-08 Intel Corporation Various apparatuses and methods for switching between buffers using a video frame buffer flip queue
JP4893154B2 (en) * 2006-08-21 2012-03-07 富士通セミコンダクター株式会社 Image processing apparatus and image processing method
US8063910B2 (en) * 2008-07-08 2011-11-22 Seiko Epson Corporation Double-buffering of video data
US8872812B2 (en) * 2009-11-12 2014-10-28 Marvell World Trade Ltd. Power saving in mobile devices by optimizing frame rate output
US8629913B2 (en) * 2010-09-30 2014-01-14 Apple Inc. Overflow control techniques for image signal processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060146185A1 (en) * 2005-01-05 2006-07-06 Microsoft Corporation Software-based audio rendering
US20080012985A1 (en) * 2006-07-12 2008-01-17 Quanta Computer Inc. System and method for synchronizing video frames and audio frames
US20100020088A1 (en) * 2007-02-28 2010-01-28 Panasonic Corporation Graphics rendering device and graphics rendering method
US20110110420A1 (en) * 2009-11-06 2011-05-12 Qualcomm Incorporated Control of video encoding based on image capture parameter

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9728166B2 (en) * 2015-08-20 2017-08-08 Qualcomm Incorporated Refresh rate matching with predictive time-shift compensation
WO2019001077A1 (en) * 2017-06-30 2019-01-03 武汉斗鱼网络科技有限公司 Method and apparatus for controlling synchronization of cpu threads and gpu threads

Also Published As

Publication number Publication date
US20140292785A1 (en) 2014-10-02
US8754904B2 (en) 2014-06-17
US20120262463A1 (en) 2012-10-18

Similar Documents

Publication Publication Date Title
US8754904B2 (en) Virtualization method of vertical-synchronization in graphics systems
US7450130B2 (en) Adaptive scheduling to maintain smooth frame rate
US20150177822A1 (en) Application-transparent resolution control by way of command stream interception
US7397478B2 (en) Various apparatuses and methods for switching between buffers using a video frame buffer flip queue
JP4383853B2 (en) Apparatus, method and system using graphic rendering engine with temporal allocator
US9030481B2 (en) Method and apparatus for reducing power usage during video presentation on a display
US11568588B2 (en) Controlling display performance using display statistics and feedback
US9323571B2 (en) Methods for reducing energy consumption of buffered applications using simultaneous multi-threading processor
US11645117B2 (en) System and method for multi-tenant implementation of graphics processing unit
KR20220027964A (en) Real-time GPU rendering with performance-guaranteed power management
KR20220143667A (en) Reduced display processing unit delivery time to compensate for delayed graphics processing unit render times
CN105719229B (en) Resolution control of application program transparentization based on instruction stream interception
TW202230325A (en) Methods and apparatus for display panel fps switching
US20120188261A1 (en) Contract based memory management for isochronous streams
US11847995B2 (en) Video data processing based on sampling rate
US20210358079A1 (en) Methods and apparatus for adaptive rendering
WO2021151228A1 (en) Methods and apparatus for adaptive frame headroom
WO2021248370A1 (en) Methods and apparatus for reducing frame drop via adaptive scheduling
WO2023230744A1 (en) Display driver thread run-time scheduling
US11776507B1 (en) Systems and methods for reducing display latency
US20230196498A1 (en) Scheduling techniques in split rendering
US20220013087A1 (en) Methods and apparatus for display processor enhancement
CN117812332A (en) Playing processing method and device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: LUCIDLOGIX SOFTWARE SOLUTIONS, LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SEGAL, NATALYA;SHOSHAN, YOEL;SELA, GUY;SIGNING DATES FROM 20140612 TO 20140617;REEL/FRAME:033568/0913

AS Assignment

Owner name: LUCIDLOGIX TECHNOLOGIES LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LUCIDLOGIX SOFTWARE SOLUTIONS, LTD.;REEL/FRAME:034748/0277

Effective date: 20141231

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LUCIDLOGIX TECHNOLOGY LTD.;REEL/FRAME:046361/0169

Effective date: 20180131