WO2024094297A1 - System for loading an entity or an asset on a computer system, corresponding method and computer program - Google Patents

System for loading an entity or an asset on a computer system, corresponding method and computer program Download PDF

Info

Publication number
WO2024094297A1
WO2024094297A1 PCT/EP2022/080583 EP2022080583W WO2024094297A1 WO 2024094297 A1 WO2024094297 A1 WO 2024094297A1 EP 2022080583 W EP2022080583 W EP 2022080583W WO 2024094297 A1 WO2024094297 A1 WO 2024094297A1
Authority
WO
WIPO (PCT)
Prior art keywords
coroutine
queue
loading
coroutines
future
Prior art date
Application number
PCT/EP2022/080583
Other languages
French (fr)
Inventor
Carl Johan LEJDFORS
Original Assignee
Proxima Beta Europe B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Proxima Beta Europe B.V. filed Critical Proxima Beta Europe B.V.
Priority to PCT/EP2022/080583 priority Critical patent/WO2024094297A1/en
Publication of WO2024094297A1 publication Critical patent/WO2024094297A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Definitions

  • the present disclosure relates to system for loading an entity or an asset on a computer system and a corresponding method.
  • the disclosure is suited, but not limited, to a high- performance resource scarce application that has high demands for CPU performance and utilization and IO bandwidth. This is the case of, for instance, games, game engines, out- of-core visualization, and other types of data processing applications such as databases, machine learning systems, etc.
  • Objects in computer games are often expressed as entities consisting of multiple assets as well as, possibly, other entities.
  • An example would be an non-player character, or NPC, consisting of: a number of meshes, textures, and shaders for the visual representation; a logic component (which is another type of entity) controlling the behavior of the NPC; and an animation system (another entity) that maps movement from the logic component to visual movements.
  • game engines typically use multi-threaded loading when loading entities.
  • high-level entities such as an NPC, an object in the world, up to entire levels, are requested by one of the various game systems. These requests result in a single entity and all its referenced assets (meshes, textures, shaders, physics collision objects, ...) are loaded by a single loader thread (of which there can be multiple).
  • the reason for this being hard to further multi-thread is that referenced assets and entities are not known until the root entity has been loaded and processed. This is known as in other areas as a dependent read and that terminology will be used here as well.
  • the problem is that loading an entity then becomes a combination of IO work to first access data from, for instance, the hard drive followed by CPU work to process the loaded data and to discover what more to load.
  • This type of recursive dependent read pattern can be many levels deep.
  • the one or more loader threads interleave CPU work (for preparing assets, registering them in the engine, etc.) and IO work leading to sub-optimal performance in both. Since the loading of a single entity and all its referenced assets is effectively tied to a single loader thread for the duration of the load, the potential work that could have been performed by that thread is blocked while the thread waits for IO to complete.
  • loading prioritization cannot easily be controlled.
  • prioritization relies on complex interactions with the OS in the form of thread scheduling and IO prioritization.
  • any load priority must be known at the time of enqueueing the entity load request.
  • Operating system/OS - A system software that manages computer hardware, software resources, and provides common services for computer programs.
  • Thread - A thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the OS. The scheduler is configured to interrupt and give control to other threads which is known as preemptive multitasking.
  • Coroutine A computer program component that generalizes subroutines for non- preemptive multitasking. It is a function that can suspend execution to be resumed later. This concept should be understood by the term coroutine and therefore related concepts such as fibers, greenlets, green thread, etc. allowing the same functionality, i.e., non- preemptive multitasking by suspending execution and resuming later, are referred to also as the term coroutine. Thus, the term coroutine includes different concepts for non- preemptive multitasking in differentiation from threads.
  • Entity An object referencing multiple assets or other entities.
  • Asset - Data used by objects such as meshes, textures, shaders, or other data.
  • EP 1 788 486 A2 refers specifically to utilizing worker threads to process either coroutine or “regular” thread workloads on available worker threads. The decision on what each worker thread should process is determined by a control thread whose responsibility it is to swap the thread context with that of the coroutine, if a coroutine workload is to be processed.
  • a system for loading an entity or an asset on a computer system comprising: a cache component, the cache component configured to provide an associative container between an identifier of an object and a future object; a reactor component, the reactor component maintaining a queue of coroutines, and at least one worker thread.
  • the system further comprises program code configured to receive a loading request from a caller for the loading of the object, namely an entity or an asset, the loading request including the identifier of the object; create a loading coroutine from the loading request, the loading coroutine comprising computer code for loading the object; register the coroutine in the reactor component; create a future object referencing the created coroutine; insert the future object in the cache component; and return a sentinel of the future object to the caller.
  • the worker thread is configured to run the coroutine based on the queue maintained by the reactor component.
  • This disclosure utilizes coroutines to allow efficient reuse of thread resources while waiting for IO to complete. This is achieved by implementing the loading requests of portions of data as coroutines, which are maintained in an appropriate queue by the reactor component.
  • the worker threads use the queue maintained by the reactor component and run appropriate coroutines.
  • the coroutine may be suspended and the worker thread continues with another coroutine. Thereby, since the same worker thread is running and consecutively runs the queued coroutines, the switching between threads, e.g. the high number of worker threads as known in the art, may be avoided.
  • coroutines For instance, most games use multiple threads to load multiple resource concurrently. Such loads typically consists of a mixture of CPU work: to prepare and issue IO requests and to process the returned data. Commonly, such systems also exhibit dependent reads where the system must first load some data in order to determine subsequent resources needed. For example, a small asset description is first loaded, parsed, and then referenced assets are loaded recursively.
  • the advantages of coroutines over threads are that they may be used in a hard-realtime context (switching between coroutines need not involve any system calls or any blocking calls whatsoever), there is no need for synchronization primitives such as mutexes, semaphores, etc. in order to guard critical sections, and there is no need for support from the operating system.
  • running coroutines in a specified number of one or more worker threads in the claimed way provides the benefit of reducing overhead following from blocked thread resources and from processing need for thread switching.
  • the present aspect specifically deals with the efficient scheduling of coroutines waiting either for IO (or other similar external events such as network messages or user input) or other coroutines by not blocking thread resources in those cases.
  • the present aspect allows to deal with non-trivial, arbitrarily nested dependencies of operations in the form of dependent reads of other resources whose existence cannot be determined a priori.
  • the cache component may comprise at least one of a software cache, which is the most preferred option, a hardware cache or arbitrary combinations thereof.
  • the cache component provides a cached data storage resource in which the loaded entities or assets are stored and made available for access by the caller.
  • the cache component is configured to store data so that future requests for that data can be served faster.
  • the loading or reading request can be served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.
  • the cache component provides an associative container or array which is an abstract data type that stores a collection of (key, value) pairs, in this embodiment between the identifier of the object and a future.
  • the future is provided with the loaded data such that the loaded data can be accessed in the cache using the associative contained and the identifier.
  • the identifier may take any feasible form and is not limited to, for instance, numerical or alphanumerical identifiers. To the contrary, the identifier is characterized by its function of identifying the requested entity or asset.
  • the reactor component may also consist of software, hardware, or a combination thereof. In a preferred embodiment, the reactor component will be implemented as software such that no dedicated hardware is necessary.
  • the reactor component can employ libraries of known programming languages for providing queues of coroutines.
  • the queue of coroutines allows the adding of coroutines, preferentially the change of priority of the coroutines or a sorting of the coroutines within the queue, and a removal of coroutines from the queue.
  • the system is implemented as a process on a computer system, wherein the computer system may be a personal desktop computer, a console, a portable laptop computer, a smartphone or any other portable computing device, or a distributed system including a cloud server system.
  • the computer system may be a personal desktop computer, a console, a portable laptop computer, a smartphone or any other portable computing device, or a distributed system including a cloud server system.
  • the system may be implemented as a management thread waiting for requests from a caller and implementing the cache component and the reactor component, wherein the cache component and the reactor component can also be implemented as separate threads, wherein the actual implementation is left to the programmer of the system and the application.
  • the cache component and the reactor component can also be implemented as separate threads, wherein the actual implementation is left to the programmer of the system and the application.
  • other applications can distribute the workload, for instance different worker threads, over a plurality of computer systems.
  • the worker threads may be exclusive for the claimed system and have a predefined number. In one embodiment, any positive number of worker threads can be supported, for instance and without being limited between 1 and 100, preferentially at least 2.
  • the plurality of worker threads can also be referred to as a so called thread pool.
  • the thread pool does not need to be exclusively used by the claimed system and may be made available to other applications.
  • the caller may be any suitable computer unit or software, including threads, processes, etc., that communicates with the system to request the loading of the entity or asset. For instance, in case the system is implemented as part of a games engine, a subunit or component of the games engine requests the entity or asset from the system. However, the caller may also be a remote computer or device requesting the loading of the entity or asset via a wired or wireless network connection.
  • the loading coroutine preferentially encapsulates the load process and subsequent processing required to load the asset or entity. Thus, the loading coroutine preferentially includes tasks using CPU computing power and/or IO system resources.
  • the system is in preferred embodiments implemented as part of at least one of games, game engines, out-of-core visualization, databases, and machine learning systems.
  • coroutine support similar in capability to that of C++20 and is applicable to any language and runtime that supports coroutines of similar capabilities. It can also be implemented using fibers, greenlets, green threads, or similar behaving constructs in languages supporting these constructs, wherein these concepts including fibers, greenlets, green threads and the like allowing suspension and resumption functionality for non-preemtive multitasking are referred to as coroutines in the present context.
  • the reactor component comprises a waiting for input/output, IO, queue maintaining a queue of suspended coroutines waiting for IO to complete.
  • the waiting for IO queue thus includes all coroutines which have issued a request for IO resources, typically the loading of data, that has not completed yet.
  • the coroutines in the waiting for IO queue are suspended and may be resumed once the IO is completed.
  • the reactor component comprises a waiting for tasks queue maintaining a queue of suspended coroutines waiting for one or more other coroutines to complete.
  • the coroutines maintained in the waiting for tasks queue are suspended and waiting for at least one further coroutine to complete.
  • the reactor component comprises a ready queue maintaining a queue of coroutines that are ready to run. Ready to run coroutines may be selected by one of the worker threads for running. Coroutines are moved from the waiting for IO queue or the waiting for tasks queue when the reason for the respective suspension is solved, i.e. , when either the IO completes and/or all the dependent coroutines are finished.
  • the reactor component comprises a running queue maintaining a queue of currently running coroutines.
  • the worker threads running one of the coroutines from the ready queue triggers the respective coroutine to be moved from the waiting queue to the running queue, preferentially by the worker thread itself using the reactor component functions or interface.
  • the reactor component is configured, upon update of the system, preferentially upon every update of the system, to query an IO subsystem of an underlying operating system, OS, to determine whether one or more of the coroutines queued in the waiting for IO queue can be resumed, and, in the affirmative, moving the determined coroutines from the waiting for IO queue to the ready queue.
  • the reactor component is configured, upon update of the system, preferentially upon every update of the system, to query the running queue for completed coroutines, wherein for each of the identified completed coroutines, the reactor component is further configured to query the waiting for tasks queue for coroutines waiting for the identified completed coroutine, and, in case the identified completed coroutine is the only remaining coroutine to await for, move said coroutine, of which the identified completed coroutine is the only remaining coroutine to await for, from the waiting for tasks queue to the ready queue.
  • the worker thread is configured to select one coroutine from the ready queue maintained by the reactor component; move the selected coroutine to the running queue maintained by the reactor component; and resume the selected coroutine on the current thread.
  • the worker thread is configured to select the coroutine from the ready queue having the highest priority assigned thereto.
  • the reactor component comprises priority queues rather than regular queues or stack data structures for maintaining the coroutine queues.
  • Each element of the priority queue additionally has a priority associated with it.
  • a priority queue an element with high priority is served before an element with low priority.
  • two elements if two have the same priority, they are served according to the order in which they were enqueued; in other implementations ordering of elements with the same priority remains undefined.
  • the worker thread is, when resuming the selected coroutine on the current thread, in case the coroutine awaits one or more IO tokens that has not yet finished further configured to: suspend the coroutine; and move the coroutine from the running queue to the waiting for IO queue.
  • the worker thread is, when resuming the selected coroutine on the current thread, in case the coroutine awaits one or more other coroutines that have not yet finished suspend the coroutine; and move the coroutine from the running queue to the waiting for tasks queue.
  • the worker thread is, when resuming the selected coroutine on the current thread, in case the coroutine yields suspend the coroutine; move the coroutine from the running queue to the ready queue.
  • coroutines stems from the yield function, which allows a running coroutine to suspend its execution so that it can be resumed later. Therefore, unnecessary occupation of CPU usage can be avoided.
  • the worker thread is, when resuming the selected coroutine on the current thread, the worker thread is, in case the coroutine completes, configured to move the resulting object into the future object value of the cache component.
  • the worker thread is, in case the coroutine completes, configured to destroy the coroutine state to reclaim memory.
  • the worker thread is, in case the coroutine completes, configured to notify the future using at least one of a condition variable, an atomic operation, or similar future thread synchronization primitives.
  • the reactor component is configured to control a priority of the registered coroutines.
  • the future provides a query and object load completion interface or function to the caller.
  • the system is configured to enable manipulation of a priority of the returned future from an external source including the caller.
  • the loading coroutine is configured to issue an asynchronous input/output, IO, request, in particular to an underlying operation system, OS.
  • system is configured, prior to creating the loading coroutine, consult the cache component whether the object was previously requested for loading, and, in case the object was previously requested, return the previously created future sentinel to the caller.
  • the result of the earlier loading can be obtained quickly and resource efficiently by accessing the cache component and avoiding the repeated loading of the entity or asset.
  • system is implemented on at least one of a personal computer, PC, console, and mobile hardware supporting an asynchronous IO mechanism.
  • a method for loading an entity or an asset on a computer system comprising the following steps: receiving a loading request from a caller for the loading of an object, namely an entity or an asset, the loading request including an identifier of the object; creating a loading coroutine from the loading request; registering the coroutine in a reactor component, the reactor component maintaining a queue of coroutines; creating a future object referencing to the created coroutine; inserting the future object in a cache component, the cache component providing an associative container between the identifier of the object and the future object; returning a sentinel of the future to the caller; and running the coroutine using at least one worker thread based on the queue maintained by the reactor component.
  • a computer program comprising program code means for causing a computer to carry out the steps of the method according to an aspect of the disclosure when said computer program is carried out on a computer.
  • Fig. 1 schematically and exemplarily illustrates a system for loading an entity or an asset.
  • Fig. 2 schematically and exemplarily illustrates a dataflow chart for loading an entity or an asset using a cache component.
  • FIG. 3 schematically and exemplarily illustrates a prior art loading thread occupation.
  • Fig. 4 schematically and exemplarily illustrates thread occupation using the loading scheme including coroutines according to the present disclosure.
  • Fig. 5 schematically and exemplarily illustrates a cache component.
  • Fig. 6 schematically and exemplarily illustrates a reactor component.
  • Fig. 7 schematically and exemplarily illustrates a reactor component dataflow.
  • Fig. 8 schematically and exemplarily illustrates a worker thread process.
  • Fig. 1 schematically and exemplarily illustrates a system 100 for loading an entity or an asset.
  • a caller 10 issues a loading request 12 to the system 100 for loading the entity or asset.
  • an entity is an object referencing multiple assets or other entities. No entity may recursively reference itself.
  • Asset is data used by objects such as meshes, textures, shaders, or other data.
  • requested entities include high-level entities such as an NPC, an object in the world, up to entire levels.
  • the system 100 loads the requested entity or asset from a data storage 20.
  • the loading is in this example implemented using asynchronous IO requests to an underlying OS. Therefore, the data storage 20 may be any suitable data storage and includes, without being limited, the RAM/ROM, a hard disk drive, a solid state disk drive, an optical storage means, a cloud storage means and the like.
  • the actual IO requests are executed in system 100 by coroutines, which are registered and maintained in a reactor component 40, which will be described in detail with reference to Fig. 6, and run by one or more worker threads 50.
  • the loaded data is managed by a cache component 30, which will be described in detail with reference to Fig. 5.
  • the caller 10 In response to the loading request 12, the caller 10 is provided with a future sentinel 14 to the requested entity or asset from the system 100.
  • Fig. 2 schematically and exemplarily illustrates a dataflow chart of a method 200 for loading an entity or an asset using the system 100 of Fig. 1 and in particular cache component 30 thereof.
  • a first step 210 an entity or asset is requested by the caller 10 from the system 100 using a load request 12.
  • cache component 30 After receiving the request, cache component 30 is consulted. The system in a subsequent step 220 evaluates, whether the requested entity or asset is cached using cache component 30.
  • step 270 the method proceeds to step 270 and returns a sentinel to the future 14 of the requested entity or asset already contained in the cache to the caller 10.
  • the sentinel allows the caller 10 to query completion state for the load, wait for the object load to complete, as well as manipulate priority.
  • the method proceeds to step 230 and creates a coroutine from a load function associated with the requested entity or asset.
  • the coroutine encapsulates the load process and subsequent processing required to load the asset or entity.
  • the load function may be simple or complex depending on the complexity of the requested entity or asset. If the requested entity refers to further entities of assets, corresponding additional loader coroutines for the referenced further entities or assets are also registered. The registering of additional coroutines can be done initially or recursively during running of the respective loader coroutine.
  • the coroutine is registered in the reactor component 40.
  • the reactor component 40 maintains in this example four queues of coroutines of different states.
  • the running of the registered respective coroutines is, as will be detailed below, performed by one particular of the one or more worker threads 50.
  • the system 100 creates a future sentinel 14, or simply referred to as future 14, referencing the coroutine that was registered in step 240 using, for instance, a handle of the coroutine.
  • future 14 does not contain the requested data and acts as a place holder for receiving the data of the requested entity or asset as soon as the IO responsible for the actual loading is finished.
  • future 14 is inserted in the cache maintained by cache component 30 before the method returns the future 14 to the caller 10 in step 270 described above.
  • FIG. 3 illustrates the deficiencies of a common example of prior art thread utilization, wherein Fig. 4 in contrast illustrates the benefits of the present disclosure.
  • Fig. 3 the workload of three threads 52, 54, 56 is illustrated over time. As can be seen, portions of CPU work 310 alternate with long periods of waiting for IO 320 indicated with dashed lines. During the periods of waiting for IO 320, the respective thread does not use the CPU such that the thread utilization is not optimal as significant time is spent waiting for IO requests to complete.
  • Fig. 4 illustrates the inventive concept of using coroutines together with threads to maximize thread utilization and IO throughput. It can be seen that different coroutines 420, 440, 460, 480 only occupy the respective threads 52, 54 until, for instance, the respective coroutines wait for an IO request to complete. In such case, the respective coroutines suspend and are not resumed until the respective IO request is complete.
  • coroutines 420, 440, 460, 480 in this example, it is possible to eliminate threads being in use while waiting for IO requests to complete. The result of this is visually described in Figure 4 and the effect on thread utilization is apparent. Moreover, the same coroutine, in this example coroutine 420, gets suspended at the end of a block and then resumed by another thread later.
  • Cache component 30 comprises an associative list between identifiers 32 and the respective loaded data of the entity or asset 34. Before the loading is complete, a future to the respective entity or asset 34 is provided.
  • the data construct of the element of the associative list may be implemented as a future which is communicated to the caller 10.
  • the future, of which a sentinel 14 may be communicated is a construct used for synchronizing program execution used in various known programming languages. It describes an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is not yet complete.
  • cache component 30 can implement all functions widely known to the skilled person in the context of caching such as freeing memory of loaded or cached elements when necessary.
  • Fig. 6 schematically and exemplarily illustrates the layout of the reactor component 40.
  • the reactor component 40 maintains in this example four queues of loader coroutines, a waiting for IO queue 42, a waiting for tasks queue 44, a ready queue 46, and a running queue 48.
  • the coroutines 420, 440, 460, 480 are illustrated as being queued in one of the queues 42- 48, respectively.
  • the reactor component 40 is configured to move the coroutines between the different queues according to their respective state and to modify a priority of the respective coroutines, if appropriate.
  • Waiting for IO queue 42 includes suspended coroutines waiting for IO to complete grouped by IO completion token.
  • Waiting for tasks queue 44 includes suspended coroutines waiting for 1 or more other coroutines to complete.
  • Ready queue 46 includes coroutines that are ready to run.
  • Running queue 48 includes coroutines currently running on one of the worker threads 50.
  • reactor component 40 includes a priority component 49 which allows influencing the priority of the coroutines registered with the reactor component 40 and maintained in one of the queues 42-48. Changed priorities apply after suspension and restart of the respective coroutines.
  • the operation and dataflow of the reactor component 40 is schematically and exemplarily illustrated using the flow chart of Fig. 7.
  • the reactor component 40 is in a step 720 configured to query the IO subsystem, for instance of the underlying OS, to determine set of coroutines waiting for IO in waiting for IO queue 42 that can be resumed. In case such coroutines in waiting for IO queue 42 exist (“Yes” branch) the coroutines are moved to ready queue 46 in step 730, otherwise, the routine directly proceeds with step 740.
  • step 740 running queue 48 is queried for completed coroutines.
  • step 760 looks for any coroutine B whose only remaining task to await is A in waiting for tasks queue 46.
  • B is moved to the ready queue 46 in step 770.
  • Fig. 8 schematically and exemplarily illustrates the periodical operational flow of the worker threads 50.
  • a step 805 the worker thread or threads pick the coroutine in particular with the highest priority from ready queue 46.
  • the picked coroutine is moved from ready queue 46 to running queue 48.
  • a step 815 the coroutine is resumed on the current thread.
  • the coroutine is preferentially resumed using the available programming language functionality to do so.
  • the coroutine is resumed while all of the checks in step 820, 825, 830, and 835 are negative.
  • step 820 it is checked whether or determined that the coroutine awaits one or more IO tokens that has not yet finished. In the affirmative, the coroutine is suspended in step 840 and the coroutine is moved from the running queue 48 to the waiting for IO queue 42 in step 845.
  • step 825 it is checked whether or determined that the coroutine awaits one or more other coroutines that have not yet finished. In the affirmative, the coroutine is suspended in step 850 and the coroutine is moved from the running queue 48 to the waiting for tasks queue 44 in step 855.
  • step 830 it is checked whether or determined that the coroutine yields.
  • the yield of the coroutine can be either to manually time slice workloads or as response to some other application defined criteria.
  • the coroutine is suspended in step 860 and the coroutine is moved from the running queue 48 to the ready queue 46.
  • step 835 it is checked whether or determined that the coroutine completes.
  • the resulting object is moved into the cache future value 34 in step 870.
  • the coroutine state is destroyed to reclaim memory in step 875.
  • the future 14 is notified.
  • the notification is done using either a condition variable, an atomic operation, or similar future thread synchronization primitives. This in turn allows possible external waiters to be notified.
  • prioritization of CPU workloads can be controlled at any time. Changes of priority will take effect before the next resumption, meaning that priority changes done while the coroutine is not suspended will not take effect until the coroutine has been suspended and resumed once again.
  • a computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.
  • a suitable medium such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.
  • the invention s claim to novelty is the combination of using coroutines together with threads to minimize CPU wait times, maximizing IO request issuing rates, and the problem of postissue prioritization. It achieves this without over-saturating the system with worker threads and enables a small set of common worker threads to deal effectively with both CPU and CPU+IO workloads.
  • the invention is applicable to all PC, console, and mobile hardware that supports some form of asynchronous IO mechanism.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to a system (100), a corresponding method and a corresponding computer program for loading an entity or an asset on a computer system, wherein the system (100) comprises: a cache component (30), the cache component (30) configured to provide an associative container between an identifier (32) of an object and a future object (34); a reactor component (40), the reactor component (40) maintaining a queue of coroutines; and at least one worker thread (50, 52, 54, 56). The system (100) further comprises program code configured to receive (210) a loading request (12) from a caller (10) for the loading of the object, namely an entity or an asset; create (230) a loading coroutine (420, 440, 460, 480) from the loading request, wherein the worker thread (50, 52, 54, 56) is configured to run the coroutine based on a queue maintained by the reactor component (40).

Description

System for loading an entity or an asset on a computer system, corresponding method and computer program
The present disclosure relates to system for loading an entity or an asset on a computer system and a corresponding method. The disclosure is suited, but not limited, to a high- performance resource scarce application that has high demands for CPU performance and utilization and IO bandwidth. This is the case of, for instance, games, game engines, out- of-core visualization, and other types of data processing applications such as databases, machine learning systems, etc.
Objects in computer games are often expressed as entities consisting of multiple assets as well as, possibly, other entities. An example would be an non-player character, or NPC, consisting of: a number of meshes, textures, and shaders for the visual representation; a logic component (which is another type of entity) controlling the behavior of the NPC; and an animation system (another entity) that maps movement from the logic component to visual movements.
For efficiency, game engines typically use multi-threaded loading when loading entities. In these engines, high-level entities such as an NPC, an object in the world, up to entire levels, are requested by one of the various game systems. These requests result in a single entity and all its referenced assets (meshes, textures, shaders, physics collision objects, ...) are loaded by a single loader thread (of which there can be multiple). The reason for this being hard to further multi-thread is that referenced assets and entities are not known until the root entity has been loaded and processed. This is known as in other areas as a dependent read and that terminology will be used here as well.
The problem is that loading an entity then becomes a combination of IO work to first access data from, for instance, the hard drive followed by CPU work to process the loaded data and to discover what more to load. This type of recursive dependent read pattern can be many levels deep.
This means that the one or more loader threads interleave CPU work (for preparing assets, registering them in the engine, etc.) and IO work leading to sub-optimal performance in both. Since the loading of a single entity and all its referenced assets is effectively tied to a single loader thread for the duration of the load, the potential work that could have been performed by that thread is blocked while the thread waits for IO to complete.
In order to reach IO saturation, it is often necessary to create many more threads than the logical core count of the target hardware. This in turn leads to increase system memory usage (from thread stacks and metadata) and decreased performance across the entire application (in particular shared resources such as memory allocators) due increasing context switching and locking overhead.
Furthermore, a problem inherit to the traditional loading patterns described above is that loading prioritization cannot easily be controlled. In general, such prioritization relies on complex interactions with the OS in the form of thread scheduling and IO prioritization. In addition, once the entity has been picked up for loading by the loader thread it is not straightforward to change the priority of the entity load request, meaning any load priority must be known at the time of enqueueing the entity load request.
In the context of the present disclosure, the following abbreviations and key terminologies apply.
Operating system/OS - A system software that manages computer hardware, software resources, and provides common services for computer programs.
IO - Input/output, a common acronym for any operation that reads/writes from/to disk, a socket, or similar. Thread - A thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the OS. The scheduler is configured to interrupt and give control to other threads which is known as preemptive multitasking.
Coroutine - A computer program component that generalizes subroutines for non- preemptive multitasking. It is a function that can suspend execution to be resumed later. This concept should be understood by the term coroutine and therefore related concepts such as fibers, greenlets, green thread, etc. allowing the same functionality, i.e., non- preemptive multitasking by suspending execution and resuming later, are referred to also as the term coroutine. Thus, the term coroutine includes different concepts for non- preemptive multitasking in differentiation from threads.
Future - An object that provides a mechanism to access the result of an asynchronous operation. They describe an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is not yet complete. In the context of the present disclosure future is used interchangeably with promise, delay and deferred and all of these concepts shall be understood under the term future unless explicitly stated otherwise.
Entity - An object referencing multiple assets or other entities.
Asset - Data used by objects such as meshes, textures, shaders, or other data.
EP 1 788 486 A2 refers specifically to utilizing worker threads to process either coroutine or “regular” thread workloads on available worker threads. The decision on what each worker thread should process is determined by a control thread whose responsibility it is to swap the thread context with that of the coroutine, if a coroutine workload is to be processed.
Against this background it was an object of the present invention to improve the efficiency of loading entities or assets on computer systems. In particular, it was an object to minimize CPU wait times and maximize IO request issuing rates when loading entities or assets. Further, some embodiments aim at improving the prioritization of dependent reads.
In a first aspect a system for loading an entity or an asset on a computer system is provided, wherein the system comprises: a cache component, the cache component configured to provide an associative container between an identifier of an object and a future object; a reactor component, the reactor component maintaining a queue of coroutines, and at least one worker thread. The system further comprises program code configured to receive a loading request from a caller for the loading of the object, namely an entity or an asset, the loading request including the identifier of the object; create a loading coroutine from the loading request, the loading coroutine comprising computer code for loading the object; register the coroutine in the reactor component; create a future object referencing the created coroutine; insert the future object in the cache component; and return a sentinel of the future object to the caller. The worker thread is configured to run the coroutine based on the queue maintained by the reactor component.
This disclosure utilizes coroutines to allow efficient reuse of thread resources while waiting for IO to complete. This is achieved by implementing the loading requests of portions of data as coroutines, which are maintained in an appropriate queue by the reactor component. The worker threads use the queue maintained by the reactor component and run appropriate coroutines. When coroutines, for instance, wait for completion of an IO request, the coroutine may be suspended and the worker thread continues with another coroutine. Thereby, since the same worker thread is running and consecutively runs the queued coroutines, the switching between threads, e.g. the high number of worker threads as known in the art, may be avoided.
This leads to the present disclosure being able to lower the effective number of loader threads and still reach IO and CPU saturation. Furthermore the disclosure enables prioritization of CPU workloads after workload submission by allowing resumption of coroutines in priority order.
By utilizing the suspension and resumption functionality inherit to coroutines it is possible to eliminate threads being in use while waiting for IO requests to complete.
For instance, most games use multiple threads to load multiple resource concurrently. Such loads typically consists of a mixture of CPU work: to prepare and issue IO requests and to process the returned data. Commonly, such systems also exhibit dependent reads where the system must first load some data in order to determine subsequent resources needed. For example, a small asset description is first loaded, parsed, and then referenced assets are loaded recursively. In a preferred embodiment the advantages of coroutines over threads are that they may be used in a hard-realtime context (switching between coroutines need not involve any system calls or any blocking calls whatsoever), there is no need for synchronization primitives such as mutexes, semaphores, etc. in order to guard critical sections, and there is no need for support from the operating system. Thus, running coroutines in a specified number of one or more worker threads in the claimed way provides the benefit of reducing overhead following from blocked thread resources and from processing need for thread switching.
The present aspect specifically deals with the efficient scheduling of coroutines waiting either for IO (or other similar external events such as network messages or user input) or other coroutines by not blocking thread resources in those cases.
Further, the present aspect allows to deal with non-trivial, arbitrarily nested dependencies of operations in the form of dependent reads of other resources whose existence cannot be determined a priori.
The cache component may comprise at least one of a software cache, which is the most preferred option, a hardware cache or arbitrary combinations thereof. The cache component provides a cached data storage resource in which the loaded entities or assets are stored and made available for access by the caller. The cache component is configured to store data so that future requests for that data can be served faster. When the requested data can be found in the cache, the loading or reading request can be served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.
To this end, the cache component provides an associative container or array which is an abstract data type that stores a collection of (key, value) pairs, in this embodiment between the identifier of the object and a future. Upon completion of the loading request, the future is provided with the loaded data such that the loaded data can be accessed in the cache using the associative contained and the identifier.
The identifier may take any feasible form and is not limited to, for instance, numerical or alphanumerical identifiers. To the contrary, the identifier is characterized by its function of identifying the requested entity or asset. The reactor component may also consist of software, hardware, or a combination thereof. In a preferred embodiment, the reactor component will be implemented as software such that no dedicated hardware is necessary. The reactor component can employ libraries of known programming languages for providing queues of coroutines. The queue of coroutines allows the adding of coroutines, preferentially the change of priority of the coroutines or a sorting of the coroutines within the queue, and a removal of coroutines from the queue.
In one example, the system is implemented as a process on a computer system, wherein the computer system may be a personal desktop computer, a console, a portable laptop computer, a smartphone or any other portable computing device, or a distributed system including a cloud server system.
The system may be implemented as a management thread waiting for requests from a caller and implementing the cache component and the reactor component, wherein the cache component and the reactor component can also be implemented as separate threads, wherein the actual implementation is left to the programmer of the system and the application. In the exemplary field of game engines it is convenient to implement the entire system on a single local hardware system, such that no processing needs to be done externally. However, other applications can distribute the workload, for instance different worker threads, over a plurality of computer systems.
The worker threads may be exclusive for the claimed system and have a predefined number. In one embodiment, any positive number of worker threads can be supported, for instance and without being limited between 1 and 100, preferentially at least 2. The plurality of worker threads can also be referred to as a so called thread pool. The thread pool does not need to be exclusively used by the claimed system and may be made available to other applications.
The caller may be any suitable computer unit or software, including threads, processes, etc., that communicates with the system to request the loading of the entity or asset. For instance, in case the system is implemented as part of a games engine, a subunit or component of the games engine requests the entity or asset from the system. However, the caller may also be a remote computer or device requesting the loading of the entity or asset via a wired or wireless network connection. The loading coroutine preferentially encapsulates the load process and subsequent processing required to load the asset or entity. Thus, the loading coroutine preferentially includes tasks using CPU computing power and/or IO system resources.
The system is in preferred embodiments implemented as part of at least one of games, game engines, out-of-core visualization, databases, and machine learning systems.
In terms of the programming language suitable for implementing the invention, it relies on coroutine support similar in capability to that of C++20 and is applicable to any language and runtime that supports coroutines of similar capabilities. It can also be implemented using fibers, greenlets, green threads, or similar behaving constructs in languages supporting these constructs, wherein these concepts including fibers, greenlets, green threads and the like allowing suspension and resumption functionality for non-preemtive multitasking are referred to as coroutines in the present context.
In a preferred embodiment the reactor component comprises a waiting for input/output, IO, queue maintaining a queue of suspended coroutines waiting for IO to complete. The waiting for IO queue thus includes all coroutines which have issued a request for IO resources, typically the loading of data, that has not completed yet. The coroutines in the waiting for IO queue are suspended and may be resumed once the IO is completed.
In a preferred embodiment the reactor component comprises a waiting for tasks queue maintaining a queue of suspended coroutines waiting for one or more other coroutines to complete. The coroutines maintained in the waiting for tasks queue are suspended and waiting for at least one further coroutine to complete. This is a particular issue of the so- called dependent loading or reading task, in which the loading of the requested entity or asset creates a nested structure of a plurality of dependent data objects or reads.
In a preferred embodiment the reactor component comprises a ready queue maintaining a queue of coroutines that are ready to run. Ready to run coroutines may be selected by one of the worker threads for running. Coroutines are moved from the waiting for IO queue or the waiting for tasks queue when the reason for the respective suspension is solved, i.e. , when either the IO completes and/or all the dependent coroutines are finished.
In a preferred embodiment the reactor component comprises a running queue maintaining a queue of currently running coroutines. The worker threads running one of the coroutines from the ready queue triggers the respective coroutine to be moved from the waiting queue to the running queue, preferentially by the worker thread itself using the reactor component functions or interface.
In a preferred embodiment the reactor component is configured, upon update of the system, preferentially upon every update of the system, to query an IO subsystem of an underlying operating system, OS, to determine whether one or more of the coroutines queued in the waiting for IO queue can be resumed, and, in the affirmative, moving the determined coroutines from the waiting for IO queue to the ready queue.
In a preferred embodiment the reactor component is configured, upon update of the system, preferentially upon every update of the system, to query the running queue for completed coroutines, wherein for each of the identified completed coroutines, the reactor component is further configured to query the waiting for tasks queue for coroutines waiting for the identified completed coroutine, and, in case the identified completed coroutine is the only remaining coroutine to await for, move said coroutine, of which the identified completed coroutine is the only remaining coroutine to await for, from the waiting for tasks queue to the ready queue.
In a preferred embodiment the worker thread is configured to select one coroutine from the ready queue maintained by the reactor component; move the selected coroutine to the running queue maintained by the reactor component; and resume the selected coroutine on the current thread.
In a preferred embodiment the worker thread is configured to select the coroutine from the ready queue having the highest priority assigned thereto.
In this embodiment, the reactor component comprises priority queues rather than regular queues or stack data structures for maintaining the coroutine queues. Each element of the priority queue additionally has a priority associated with it. In a priority queue, an element with high priority is served before an element with low priority. In some implementations, if two elements have the same priority, they are served according to the order in which they were enqueued; in other implementations ordering of elements with the same priority remains undefined.
In a preferred embodiment the worker thread is, when resuming the selected coroutine on the current thread, in case the coroutine awaits one or more IO tokens that has not yet finished further configured to: suspend the coroutine; and move the coroutine from the running queue to the waiting for IO queue.
In an alternative or additional preferred embodiment the worker thread is, when resuming the selected coroutine on the current thread, in case the coroutine awaits one or more other coroutines that have not yet finished suspend the coroutine; and move the coroutine from the running queue to the waiting for tasks queue.
In an alternative or additional preferred embodiment the worker thread is, when resuming the selected coroutine on the current thread, in case the coroutine yields suspend the coroutine; move the coroutine from the running queue to the ready queue.
A particular benefit of using coroutines stems from the yield function, which allows a running coroutine to suspend its execution so that it can be resumed later. Therefore, unnecessary occupation of CPU usage can be avoided.
In a preferred embodiment the worker thread is, when resuming the selected coroutine on the current thread, the worker thread is, in case the coroutine completes, configured to move the resulting object into the future object value of the cache component.
In an alternative or additional preferred embodiment the worker thread is, in case the coroutine completes, configured to destroy the coroutine state to reclaim memory.
In an alternative or additional preferred embodiment the worker thread is, in case the coroutine completes, configured to notify the future using at least one of a condition variable, an atomic operation, or similar future thread synchronization primitives.
This in turn allows possible external waiters to be notified.
In a preferred embodiment the reactor component is configured to control a priority of the registered coroutines.
By controlling the priority of the registered coroutines, the efficiency of nested or dependent loading coroutines can be improved.
In a preferred embodiment the future provides a query and object load completion interface or function to the caller. In a preferred embodiment the system is configured to enable manipulation of a priority of the returned future from an external source including the caller.
In a preferred embodiment the loading coroutine is configured to issue an asynchronous input/output, IO, request, in particular to an underlying operation system, OS.
In a preferred embodiment the system is configured, prior to creating the loading coroutine, consult the cache component whether the object was previously requested for loading, and, in case the object was previously requested, return the previously created future sentinel to the caller.
Thereby, the result of the earlier loading can be obtained quickly and resource efficiently by accessing the cache component and avoiding the repeated loading of the entity or asset.
In a preferred embodiment the system is implemented on at least one of a personal computer, PC, console, and mobile hardware supporting an asynchronous IO mechanism.
In a further aspect a method for loading an entity or an asset on a computer system is provided, the method comprising the following steps: receiving a loading request from a caller for the loading of an object, namely an entity or an asset, the loading request including an identifier of the object; creating a loading coroutine from the loading request; registering the coroutine in a reactor component, the reactor component maintaining a queue of coroutines; creating a future object referencing to the created coroutine; inserting the future object in a cache component, the cache component providing an associative container between the identifier of the object and the future object; returning a sentinel of the future to the caller; and running the coroutine using at least one worker thread based on the queue maintained by the reactor component.
In a further aspect a computer program comprising program code means for causing a computer to carry out the steps of the method according to an aspect of the disclosure when said computer program is carried out on a computer.
The method and computer program can be combined with the preferred embodiments as described in relation to the system according to the first aspect of the disclosure. Such combinations will similarly achieve the described benefits and advantages. Further advantages and preferred embodiments will be described with reference to the accompanying drawings:
Fig. 1 schematically and exemplarily illustrates a system for loading an entity or an asset.
Fig. 2 schematically and exemplarily illustrates a dataflow chart for loading an entity or an asset using a cache component.
Fig. 3 schematically and exemplarily illustrates a prior art loading thread occupation.
Fig. 4 schematically and exemplarily illustrates thread occupation using the loading scheme including coroutines according to the present disclosure.
Fig. 5 schematically and exemplarily illustrates a cache component.
Fig. 6 schematically and exemplarily illustrates a reactor component.
Fig. 7 schematically and exemplarily illustrates a reactor component dataflow.
Fig. 8 schematically and exemplarily illustrates a worker thread process.
Fig. 1 schematically and exemplarily illustrates a system 100 for loading an entity or an asset. A caller 10 issues a loading request 12 to the system 100 for loading the entity or asset. As used before, an entity is an object referencing multiple assets or other entities. No entity may recursively reference itself. Asset is data used by objects such as meshes, textures, shaders, or other data. Without being limited, in the example of game engines, requested entities include high-level entities such as an NPC, an object in the world, up to entire levels.
The system 100 loads the requested entity or asset from a data storage 20. The loading is in this example implemented using asynchronous IO requests to an underlying OS. Therefore, the data storage 20 may be any suitable data storage and includes, without being limited, the RAM/ROM, a hard disk drive, a solid state disk drive, an optical storage means, a cloud storage means and the like. The actual IO requests are executed in system 100 by coroutines, which are registered and maintained in a reactor component 40, which will be described in detail with reference to Fig. 6, and run by one or more worker threads 50. The loaded data is managed by a cache component 30, which will be described in detail with reference to Fig. 5.
In response to the loading request 12, the caller 10 is provided with a future sentinel 14 to the requested entity or asset from the system 100.
Fig. 2 schematically and exemplarily illustrates a dataflow chart of a method 200 for loading an entity or an asset using the system 100 of Fig. 1 and in particular cache component 30 thereof.
In a first step 210, an entity or asset is requested by the caller 10 from the system 100 using a load request 12.
After receiving the request, cache component 30 is consulted. The system in a subsequent step 220 evaluates, whether the requested entity or asset is cached using cache component 30.
In the affirmative (“Yes” branch), the method proceeds to step 270 and returns a sentinel to the future 14 of the requested entity or asset already contained in the cache to the caller 10. The sentinel allows the caller 10 to query completion state for the load, wait for the object load to complete, as well as manipulate priority.
In the negative (“No” branch), the method proceeds to step 230 and creates a coroutine from a load function associated with the requested entity or asset. The coroutine encapsulates the load process and subsequent processing required to load the asset or entity. The load function may be simple or complex depending on the complexity of the requested entity or asset. If the requested entity refers to further entities of assets, corresponding additional loader coroutines for the referenced further entities or assets are also registered. The registering of additional coroutines can be done initially or recursively during running of the respective loader coroutine.
In a subsequent step 240, the coroutine is registered in the reactor component 40. As will be detailed below, the reactor component 40 maintains in this example four queues of coroutines of different states. The running of the registered respective coroutines is, as will be detailed below, performed by one particular of the one or more worker threads 50. In a subsequent step 250, the system 100 creates a future sentinel 14, or simply referred to as future 14, referencing the coroutine that was registered in step 240 using, for instance, a handle of the coroutine. Upon creation, future 14 does not contain the requested data and acts as a place holder for receiving the data of the requested entity or asset as soon as the IO responsible for the actual loading is finished.
Next, in a subsequent step 260, future 14 is inserted in the cache maintained by cache component 30 before the method returns the future 14 to the caller 10 in step 270 described above.
Fig. 3 illustrates the deficiencies of a common example of prior art thread utilization, wherein Fig. 4 in contrast illustrates the benefits of the present disclosure.
In Fig. 3, the workload of three threads 52, 54, 56 is illustrated over time. As can be seen, portions of CPU work 310 alternate with long periods of waiting for IO 320 indicated with dashed lines. During the periods of waiting for IO 320, the respective thread does not use the CPU such that the thread utilization is not optimal as significant time is spent waiting for IO requests to complete.
Fig. 4 illustrates the inventive concept of using coroutines together with threads to maximize thread utilization and IO throughput. It can be seen that different coroutines 420, 440, 460, 480 only occupy the respective threads 52, 54 until, for instance, the respective coroutines wait for an IO request to complete. In such case, the respective coroutines suspend and are not resumed until the respective IO request is complete.
Accordingly, by utilizing the suspension and resumption functionality inherit to coroutines 420, 440, 460, 480 in this example, it is possible to eliminate threads being in use while waiting for IO requests to complete. The result of this is visually described in Figure 4 and the effect on thread utilization is apparent. Moreover, the same coroutine, in this example coroutine 420, gets suspended at the end of a block and then resumed by another thread later.
Fig. 5 schematically and exemplarily illustrates cache component 30. Cache component 30 comprises an associative list between identifiers 32 and the respective loaded data of the entity or asset 34. Before the loading is complete, a future to the respective entity or asset 34 is provided. The data construct of the element of the associative list may be implemented as a future which is communicated to the caller 10. The future, of which a sentinel 14 may be communicated, is a construct used for synchronizing program execution used in various known programming languages. It describes an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is not yet complete.
Further, cache component 30 can implement all functions widely known to the skilled person in the context of caching such as freeing memory of loaded or cached elements when necessary.
Fig. 6 schematically and exemplarily illustrates the layout of the reactor component 40. The reactor component 40 maintains in this example four queues of loader coroutines, a waiting for IO queue 42, a waiting for tasks queue 44, a ready queue 46, and a running queue 48. The coroutines 420, 440, 460, 480 are illustrated as being queued in one of the queues 42- 48, respectively. The reactor component 40 is configured to move the coroutines between the different queues according to their respective state and to modify a priority of the respective coroutines, if appropriate.
Waiting for IO queue 42 includes suspended coroutines waiting for IO to complete grouped by IO completion token.
Waiting for tasks queue 44 includes suspended coroutines waiting for 1 or more other coroutines to complete.
Ready queue 46 includes coroutines that are ready to run.
Running queue 48 includes coroutines currently running on one of the worker threads 50.
Further, reactor component 40 includes a priority component 49 which allows influencing the priority of the coroutines registered with the reactor component 40 and maintained in one of the queues 42-48. Changed priorities apply after suspension and restart of the respective coroutines.
The operation and dataflow of the reactor component 40 is schematically and exemplarily illustrated using the flow chart of Fig. 7.
With every update of the system in step 710, the reactor component 40 is in a step 720 configured to query the IO subsystem, for instance of the underlying OS, to determine set of coroutines waiting for IO in waiting for IO queue 42 that can be resumed. In case such coroutines in waiting for IO queue 42 exist (“Yes” branch) the coroutines are moved to ready queue 46 in step 730, otherwise, the routine directly proceeds with step 740.
In step 740, running queue 48 is queried for completed coroutines.
For any completed coroutines A selected in step 750, step 760 looks for any coroutine B whose only remaining task to await is A in waiting for tasks queue 46.
If such coroutine B exists, B is moved to the ready queue 46 in step 770.
Fig. 8 schematically and exemplarily illustrates the periodical operational flow of the worker threads 50.
In a step 805, the worker thread or threads pick the coroutine in particular with the highest priority from ready queue 46.
In a step 810, the picked coroutine is moved from ready queue 46 to running queue 48.
In a step 815, the coroutine is resumed on the current thread. The coroutine is preferentially resumed using the available programming language functionality to do so. The coroutine is resumed while all of the checks in step 820, 825, 830, and 835 are negative.
In step 820 it is checked whether or determined that the coroutine awaits one or more IO tokens that has not yet finished. In the affirmative, the coroutine is suspended in step 840 and the coroutine is moved from the running queue 48 to the waiting for IO queue 42 in step 845.
In step 825 it is checked whether or determined that the coroutine awaits one or more other coroutines that have not yet finished. In the affirmative, the coroutine is suspended in step 850 and the coroutine is moved from the running queue 48 to the waiting for tasks queue 44 in step 855.
In step 830 it is checked whether or determined that the coroutine yields. The yield of the coroutine can be either to manually time slice workloads or as response to some other application defined criteria. In the affirmative, the coroutine is suspended in step 860 and the coroutine is moved from the running queue 48 to the ready queue 46. In step 835 it is checked whether or determined that the coroutine completes. In the affirmative the resulting object is moved into the cache future value 34 in step 870. Further, the coroutine state is destroyed to reclaim memory in step 875. Finally, the future 14 is notified. Preferentially, the notification is done using either a condition variable, an atomic operation, or similar future thread synchronization primitives. This in turn allows possible external waiters to be notified.
By utilizing the priority component 49, prioritization of CPU workloads can be controlled at any time. Changes of priority will take effect before the next resumption, meaning that priority changes done while the coroutine is not suspended will not take effect until the coroutine has been suspended and resumed once again.
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive; the invention is not limited to the disclosed embodiments.
Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims.
In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single component or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage.
A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.
Any reference signs in the claims should not be construed as limiting the scope.
The invention’s claim to novelty is the combination of using coroutines together with threads to minimize CPU wait times, maximizing IO request issuing rates, and the problem of postissue prioritization. It achieves this without over-saturating the system with worker threads and enables a small set of common worker threads to deal effectively with both CPU and CPU+IO workloads.
The invention is applicable to all PC, console, and mobile hardware that supports some form of asynchronous IO mechanism.

Claims

Claims
1 . A system (100) for loading an entity or an asset on a computer system, wherein the system (100) comprises: a cache component (30), the cache component (30) configured to provide an associative container between an identifier (32) of an object and a future object (34), a reactor component (40), the reactor component (40) maintaining a queue of coroutines, and at least one worker thread (50, 52, 54, 56), the system (100) further comprising program code configured to receive (210) a loading request (12) from a caller (10) for the loading of the object, namely an entity or an asset, the loading request including the identifier (32) of the object, create (230) a loading coroutine (420, 440, 460, 480) from the loading request, the loading coroutine comprising computer code for loading the object, register (240) the coroutine in the reactor component (40), create (250) a future object referencing the created coroutine, insert (260) the future object in the cache component (30), and return (270) a sentinel (14) of the future object (34) to the caller (10), wherein the worker thread (50, 52, 54, 56) is configured to run the coroutine based on the queue maintained by the reactor component (40).
2. The system (100) according to claim 1 , wherein the reactor component (40) comprises a waiting for input/output, IO, queue (42) maintaining a queue of suspended coroutines waiting for IO to complete, a waiting for tasks queue (44) maintaining a queue of suspended coroutines waiting for one or more other coroutines to complete, a ready queue (46) maintaining a queue of coroutines that are ready to run, and a running queue (48) maintaining a queue of currently running coroutines.
3. The system (100) according to claim 2, wherein the reactor component (40) is configured, upon update of the system (100), preferentially upon every update of the system, to query an IO subsystem of an underlying operating system, OS, to determine whether one or more of the coroutines (420, 440, 460, 480) queued in the waiting for IO queue (42) can be resumed, and, in the affirmative, moving the determined coroutines from the waiting for IO queue (42) to the ready queue (46).
4. The system (100) according to claim 2 or 3, wherein the reactor component (40) is configured, upon update of the system (100), preferentially upon every update of the system, to query the running queue (48) for completed coroutines, wherein for each of the identified completed coroutines, the reactor component (40) is further configured to query the waiting for tasks queue (44) for coroutines waiting for the identified completed coroutine, and, in case the identified completed coroutine is the only remaining coroutine to await for, move said coroutine, of which the identified completed coroutine is the only remaining coroutine to await for, from the waiting for tasks queue (44) to the ready queue (46).
5. The system (100) according to any of claims 2 to 4, wherein the worker thread (50, 52, 54, 56) is configured to: select (805) one coroutine from the ready queue (46) maintained by the reactor component (40), move (810) the selected coroutine to the running queue (48) maintained by the reactor component (40), and resume (815) the selected coroutine on the current thread.
6. The system (100) according to claim 5, wherein the worker thread (50, 52, 54, 56) is configured to select the coroutine from the ready queue (46) having the highest priority assigned thereto.
7. The system (100) according to claim 5 or 6, wherein the worker thread (50, 52, 54, 56) is, when resuming the selected coroutine on the current thread, further configured to at least one of: in case the coroutine awaits (820) one or more IO tokens that has not yet finished: o suspend (840) the coroutine, and o move (845) the coroutine from the running queue (48) to the waiting for IO queue (42); in case the coroutine awaits (825) one or more other coroutines that have not yet finished: o suspend (850) the coroutine, and o move (855) the coroutine from the running queue (48) to the waiting for tasks queue (44); in case the coroutine yields (830): o suspend (860) the coroutine, and o move (865) the coroutine from the running queue (48) to the ready queue (46).
8. The system (100) according to any of claims 5 to 7, wherein the worker thread (50, 52, 54, 56) is, in case the coroutine completes (835), configured to at least one of: o move (870) the resulting object into the future object value of the cache component (30), o destroy (875) the coroutine state to reclaim memory, and o notify (880) the future using at least one of a condition variable, an atomic operation, or similar future thread synchronization primitives.
9. The system (100) according to any of the preceding claims, wherein the reactor component (40) is configured to control a priority of the registered coroutines (420, 440, 460, 480).
10. The system (100) according to any of the preceding claims, wherein the future (14) provides a query and object load completion interface to the caller (10).
11 . The system (100) according to any of the preceding claims, wherein the system (100) is configured to enable manipulation of a priority of the returned future (14) from an external source including the caller (10).
12. The system (100) according to any of the preceding claims, wherein the loading coroutine is configured to issue an asynchronous input/output, IO, request, in particular to an underlying operation system, OS.
13. The system (100) according to any of the preceding claims, wherein the system is configured, prior to creating the loading coroutine, consult the cache component (30) whether the object was previously requested for loading, and, in case the object was previously requested, return the previously created future sentinel to the caller.
14. The system (100) according to any of the preceding claims, wherein the system is implemented on at least one of a personal computer, PC, console, and mobile hardware supporting an asynchronous IO mechanism.
15. A method for loading an entity or an asset on a computer system, the method comprising the following steps: receiving (210) a loading request from a caller for the loading of an object, namely an entity or an asset, the loading request including an identifier of the object, creating (230) a loading coroutine from the loading request, registering (240) the coroutine in a reactor component (40), the reactor component (40) maintaining a queue of coroutines, creating (250) a future object referencing to the created coroutine, inserting (260) the future object in a cache component (30), the cache component (30) providing an associative container between the identifier of the object and the future object, returning (270) a sentinel of the future to the caller, and running (815) the coroutine using at least one worker thread (50, 52, 54, 56) based on the queue maintained by the reactor component (40).
16. A computer program comprising program code means for causing a computer to carry out the steps of the method as claimed in claim 15 when said computer program is carried out on a computer.
PCT/EP2022/080583 2022-11-02 2022-11-02 System for loading an entity or an asset on a computer system, corresponding method and computer program WO2024094297A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/080583 WO2024094297A1 (en) 2022-11-02 2022-11-02 System for loading an entity or an asset on a computer system, corresponding method and computer program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/080583 WO2024094297A1 (en) 2022-11-02 2022-11-02 System for loading an entity or an asset on a computer system, corresponding method and computer program

Publications (1)

Publication Number Publication Date
WO2024094297A1 true WO2024094297A1 (en) 2024-05-10

Family

ID=84362334

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/080583 WO2024094297A1 (en) 2022-11-02 2022-11-02 System for loading an entity or an asset on a computer system, corresponding method and computer program

Country Status (1)

Country Link
WO (1) WO2024094297A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1788486A2 (en) 2005-11-18 2007-05-23 Sap Ag Cooperative scheduling using coroutines and threads
US20120047495A1 (en) * 2010-08-18 2012-02-23 Microsoft Corporation Execution environment support for reactive programming
WO2015078394A1 (en) * 2013-11-29 2015-06-04 Tencent Technology (Shenzhen) Company Limited Method and apparatus for scheduling blocking tasks
US20210311925A1 (en) * 2020-09-28 2021-10-07 Alipay (Hangzhou) Information Technology Co., Ltd. Blockchain transaction processing systems and methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1788486A2 (en) 2005-11-18 2007-05-23 Sap Ag Cooperative scheduling using coroutines and threads
US20120047495A1 (en) * 2010-08-18 2012-02-23 Microsoft Corporation Execution environment support for reactive programming
WO2015078394A1 (en) * 2013-11-29 2015-06-04 Tencent Technology (Shenzhen) Company Limited Method and apparatus for scheduling blocking tasks
US20210311925A1 (en) * 2020-09-28 2021-10-07 Alipay (Hangzhou) Information Technology Co., Ltd. Blockchain transaction processing systems and methods

Similar Documents

Publication Publication Date Title
US11797348B2 (en) Hierarchical task scheduling in a multi-threaded processing system
US10545789B2 (en) Task scheduling for highly concurrent analytical and transaction workloads
US11030014B2 (en) Concurrent distributed graph processing system with self-balance
US8056080B2 (en) Multi-core/thread work-group computation scheduler
EP1934737B1 (en) Cell processor methods and apparatus
US8301798B2 (en) System and method for processing large amounts of transactional data
EP0969382A2 (en) Method for efficient non-virtual main memory management
US6973650B1 (en) Method of pipelined processing of program data
US20050188177A1 (en) Method and apparatus for real-time multithreading
US20100325329A1 (en) Multiprocessor system, and method for shared use of devices among operating systems of multiprocessor system
CN110597606B (en) Cache-friendly user-level thread scheduling method
CN106681836B (en) Semaphore creation method and semaphore creation device
US5671405A (en) Apparatus and method for adaptive logical partitioning of workfile disks for multiple concurrent mergesorts
US8954969B2 (en) File system object node management
US20050066149A1 (en) Method and system for multithreaded processing using errands
WO2024094297A1 (en) System for loading an entity or an asset on a computer system, corresponding method and computer program
JP2018536945A (en) Method and apparatus for time-based scheduling of tasks
Kim et al. Virtual PIM: Resource-Aware Dynamic DPU Allocation and Workload Scheduling Framework for Multi-DPU PIM Architecture
US20240143406A1 (en) Hardware-accelerated coroutines for linked data structures
JP2021060707A (en) Synchronization control system and synchronization control method
CN116841751B (en) Policy configuration method, device and storage medium for multi-task thread pool
US20230367633A1 (en) Gpu and gpu method
US11809219B2 (en) System implementing multi-threaded applications
US20150293780A1 (en) Method and System for Reconfigurable Virtual Single Processor Programming Model
Gill Operating systems concepts

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22812607

Country of ref document: EP

Kind code of ref document: A1