WO2017095384A1 - Task-based conflict resolution - Google Patents

Task-based conflict resolution Download PDF

Info

Publication number
WO2017095384A1
WO2017095384A1 PCT/US2015/063044 US2015063044W WO2017095384A1 WO 2017095384 A1 WO2017095384 A1 WO 2017095384A1 US 2015063044 W US2015063044 W US 2015063044W WO 2017095384 A1 WO2017095384 A1 WO 2017095384A1
Authority
WO
WIPO (PCT)
Prior art keywords
tasks
task
value
isolation context
isolation
Prior art date
Application number
PCT/US2015/063044
Other languages
French (fr)
Inventor
Evan R. Kirshenbaum
Susan D. Spence
Original Assignee
Hewlett-Packard Enterprise Development LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Enterprise Development LP filed Critical Hewlett-Packard Enterprise Development LP
Priority to PCT/US2015/063044 priority Critical patent/WO2017095384A1/en
Publication of WO2017095384A1 publication Critical patent/WO2017095384A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • G06F9/528Mutual exclusion algorithms by using speculative mechanisms

Definitions

  • a computer system may have a memory that is shared by multiple computing entities (multiple threads, for example).
  • the computing entities may concurrently perform computations that change the values that are stored in the shared memory.
  • One way to control the concurrent processing by the computing entities is to organize the changes by the entities into transactions and atomically commit the transactions to memory in a manner that maintains the memory in a consistent state.
  • FIG. 1 A is a schematic diagram of a system according to an example implementation.
  • FIG. 1 B is an illustration of a conflict resolver engine of Fig. 1 A according to an example implementation.
  • Fig. 1 C is an illustration of data structures to manage multiple simultaneous value (MSV) objects according to an example implementation.
  • FIG. 2 is an illustration of a hierarchical ordering of isolation contexts according to an example implementation.
  • Fig. 3A illustrates the relationship of a parent isolation context and a live child isolation context created from the parent isolation context according to an example implementation.
  • Fig. 3B illustrates the relationship of a parent isolation context and a snapshot child isolation context created from the parent isolation context according to an example implementation.
  • FIGs. 4A and 4B are flow diagrams illustrating techniques to manage publication of an isolation context according to example implementations.
  • Fig. 5 is a flow diagram illustrating a technique to perform task-based conflict resolution according to an example implementation.
  • Fig. 6 is a flow diagram illustrating a technique to identify tasks to re-run in connection with task-based conflict resolution according to an example
  • FIG. 7 is a flow diagram illustrating a technique to reset values in connection with task-based conflict resolution according to an example implementation.
  • Fig. 8 is a schematic diagram of a system to perform task-based conflict resolution according to an example implementation.
  • Fig. 9 illustrates a schematic diagram of a system of physical machines according to an example implementation. Detailed Description
  • Multiple threads executing in one or more processes of a computer or multiple computers may perform operations that are directed to one or multiple shared data structures.
  • One approach to maintain a consistent state of the data structure is to block other threads from making changes to the data structure while one of the threads makes changes. This may, however, result in inefficient processing.
  • Another approach to maintain a consistent state of the data structure is for the threads to process their changes as transactions, which are atomically committed to memory or rolled back (e.g., their modifications discarded) upon discovery that changes made by another thread conflict with the changes attempting to be committed.
  • Such an approach may present challenges for relatively large
  • transactions e.g., transactions that read or modify a large number of locations associated with the data structure or transactions that run for a long time before attempting to commit
  • the probability that no other thread, during the course of the transaction's execution, made a modification that results in a conflict that prevents the commit attempt from being successful may be relatively small.
  • a second attempt (and subsequent attempts) at redoing the changes following a rollback and retrying the committing the transaction may once again fail.
  • an "isolation context” also called a “computational context” herein refers to an environment in which computations that are performed within the environment are contained within the environment so that the results of the computations are not, in general, visible to other isolation contexts. Due to the computational isolation, machine executable instructions, or program code, that is executing within the isolation contexts may concurrently make modifications to a data structure that is shared by the contexts.
  • a "data structure” refers to an organization of one or multiple units of data, which are stored in one or multiple storage locations.
  • an isolation context may be used to access (e.g., read or modify) multiple data structures.
  • Each isolation context may present to program code a corresponding "view" of the data structure, where the "view” refers to the value(s) that the isolation context reads for corresponding properties of the data structure.
  • a single isolation context may be associated with multiple views.
  • a "property” may be a location associated with the data structure (e.g., a field within a record or an index of an array), a structural property of the data structure (e.g., a number of elements in a list), or a relationship that is associated with the data structure (e.g., an association of a value with a key in a map).
  • isolation contexts create computational isolation, there are mechanisms by which an isolation context may transfer information to another isolation context.
  • One way a first isolation context may transfer information to a second isolation context is for the first isolation context to publish.
  • "Publishing" an isolation context refers to combining or merging the view of the publishing isolation context with the view of another isolation context so that the views are the same at the time of publication.
  • a given publication attempt may not succeed due to one or multiple publication conflicts.
  • a publication conflict also called a "conflict" herein refers to a reason for the publication not to occur.
  • An example of a conflict is the existence a modification made to a data structure (e.g., a change to a field of a record) within the isolation context that is the target of the publication attempt when a similar modification (e.g., a change to the same field of the same record) was made within the isolation context being published or when a similar value (e.g., the value of the same field of the same record) was read within the isolation context being published and a modification to the same data structure or another data structure was made that may have been based on the value that was read.
  • a similar modification e.g., a change to the same field of the same record
  • task-based conflict resolution is used to resolve conflicts that prevent publication of the first isolation context.
  • making modifications within the first isolation context may involve executing a set of tasks, where a "task" refers to a unit of one or more machine executable instructions, such as a function; a subroutine; a method; a block, expression, or sequence of statements written in a programming language; and so forth. These modifications, in turn, may give rise to conflicts that prevent publication of the first isolation context.
  • a subset of the tasks may be identified and re-run to resolve the conflicts. Because the subset of tasks that may be re-run are fewer in number than the original set of tasks, the conflicts may be timely resolved, thereby reducing the number of conflicts that may arise (if any) on the next publication attempt.
  • a processor-based system 100 includes context management resources 130, which the system 100 uses to create and manage isolation contexts 1 10 (N isolation contexts 1 10-1 , 1 10-2...1 10-N being depicted in Fig. 1A), and manage publications by the isolation contexts 1 10.
  • an isolation context is a mechanism by which executing machine executable instructions (called "program code 1 12," herein) may isolate itself from other executing program code 1 12.
  • the program code 1 12 may be an entire application or program or may be part of such an application or program, such as a thread.
  • program code 1 12 may be working in a prevailing (also referred to herein as the "current” or “working") isolation context 1 10.
  • program code 1 12-1 may be working in a prevailing isolation context 1 10-1 .
  • each thread may have its own prevailing isolation context 1 10, and different threads executing program code 1 12 at the same time may be working in different prevailing isolation contexts 1 10.
  • the prevailing isolation context 1 10 in the child thread when it begins executing is that of the parent thread (e.g., at the time the child thread was created).
  • References herein to program code 1 12 "executing in" a given isolation context 1 10 refer to the program code 1 12 executing while the given isolation context 1 10 is the prevailing isolation context.
  • a sequence of machine executable instructions (called “thread A” for this example) of program code 1 12 may be executing in one of the isolation contexts 1 10, and another sequence of machine executable instructions (called “thread B” for this example) of program code 1 12 may be executing in another one of the isolation contexts 1 10.
  • threads A and B may share a data structure 106.
  • changes made to data by thread A to the data structure 106 may be invisible to thread B, and changes made to the data structure 106 by thread B may be invisible to thread A. That is, different computations, working at the same time and looking at the same fields of the same records, may correctly see different values.
  • each isolation context 1 10 has an associated view 120 (or multiple views 120) to a given data structure 106; and as such, multiple isolation contexts 1 10 may have different associated views 120 of the same data structure 106.
  • Multiple operating system threads may work in the same isolation context 1 10, in accordance with example implementations.
  • the multiple operating system threads may be associated with multiple operating system processes and the multiple operating system processes may be associated with multiple computers. This allows sharing of the same view 120 among multiple processes and computers. Moreover, this arrangement allows processes that share an isolation context 1 10 to be written in different programming languages.
  • the data structures 106 may be stored in a data store 104.
  • the data store 104 may include a namespace that associates names with the data structures 106.
  • An example of a namespace is a file system, in which data structures 106 are stored as files that are identified by corresponding file names.
  • Another example of a namespace is a key- value store, which stores associated key-value pairs. In this manner, a key may be used to find a respective value that is stored in the key-value store.
  • the data store 104 may be stored in a physical storage device (a volatile or non-volatile memory device, a hard disk device, and so forth) or may be stored across a distributed arrangement of physical storage devices.
  • a "data structure 106" refers to any unit of data that may be stored. Examples of data structures 106 include files, records, lists, sets, maps, tables, arrays, strings, queues, stacks, graphs, directories, primitives (a number, a Boolean value, a character, as examples), and so forth.
  • a data structure 106 may also include a data structure included by reference (e.g., a pointer).
  • the context management resources 130 may include one or multiple libraries 134, with each library 134 including one or multiple functions 136 that may be called by the program code 1 12 for such purposes as creating isolation contexts, establishing views for isolation contexts, binding functions to isolation contexts, identifying conflicts, resolving conflicts, performing task-based conflict resolution, publishing contexts, as so forth, as further described herein.
  • the context management resources 130 may contain one or multiple conflict resolver engines 140 to perform task-based conflict resolution.
  • a given conflict resolver engine 140 may include one or multiple of the following components, which are described further herein: a task creator 142, a task monitor 144, a task selector 146, a task scheduler 148 and a graph generator 149.
  • the isolation contexts 1 10 are fully hierarchical, as illustrated by an example hierarchical tree 200. In this manner, multiple isolation contexts 1 10 may form a tree that is rooted at a top-level, global isolation context 1 10, and a given isolation context 1 10 may have any number of children. For the example of Fig.
  • isolation context 1 10-3 is a global context that is a parent of isolation contexts 1 10-4 and 1 10-5 and grandparent of isolations contexts 1 10-6 and 1 10-7; isolation contexts 1 10-4 and 1 10-5 are siblings; isolation contexts 1 10-6 and 1 10-7 are children of parent isolation context 1 10-4; and isolation contexts 1 10-6 and 1 10-7 are siblings.
  • the isolation context hierarchy may extend to any depth.
  • isolation context 1 10 One way for information to travel from one isolation context 1 10 to another isolation context 1 10 is for a child isolation context 1 10 to successfully publish, thereby making the changes visible in the parent isolation context 1 10.
  • isolation context P is a parent of isolation context C
  • changes made within isolation context P are generally visible in the isolation context C, but changes made in the isolation context C are not visible in the isolation context P until the isolation context C is successfully published.
  • the child isolation context 1 10 may have two forms, which are specified when the child isolation context 1 10 is created: a transparent, or live, isolation context; and an opaque, or snapshot isolation context.
  • a live child isolation context such as example live child isolation context 1 10-9 of Fig. 3A
  • changes made to a location L such as location 304 in Fig. 3A
  • parent such as isolation context 1 10-8 of Fig. 3A
  • the changes made in the parent isolation context may actually be changes that are made to an ancestor of the parent isolation context but are visible in the parent isolation context; and the changes may be due to a different child isolation context of the parent isolation context publishing its changes to the parent isolation context.
  • a snapshot child isolation context such as example snapshot child isolation context 1 10-1 1 of Fig. 3B
  • no changes that are made in the parent isolation context (such as parent isolation context 1 10-10 of Fig. 3B) after the snapshot child isolation context is created are visible within the child isolation context.
  • a live child isolation context C reads made on locations that have not been modified in the context C return, as stated above, the value that the location would have had, had the read been made in the parent isolation context P.
  • the read may be specified as being "frozen.”
  • a frozen read has the property that, once it occurs, all subsequent reads of that location within isolation context C (frozen or not) return the same value until the location is modified within isolation context C or until isolation context C is successfully published.
  • the system may determine a value as of a current time. More generally, to determine a value as of a given time (e.g., a request time), the system may attempt to determine whether a value associated with the property had been established (i.e., previously determined) in the view 120 prior to the request time and subsequent to the later of the time that the isolation context 1 10 was created or the time the isolation context 1 10 was last successfully published prior to the request time.
  • a value associated with the property had been established (i.e., previously determined) in the view 120 prior to the request time and subsequent to the later of the time that the isolation context 1 10 was created or the time the isolation context 1 10 was last successfully published prior to the request time.
  • Such a value may have been established by modifications to the property by program code 1 12 executing in the isolation context 1 10, by frozen reads to the property by program code 1 12 executing the isolation context 1 10, or by successful publication of a value for the property from a child isolation context 1 10 of the isolation context 1 10. If such a value was established, then it is the determined value. Otherwise, the determined value may be found by determining an inherited value for the property, where the inherited value is the value associated with the property within a parent view 120 of the view 120 associated with the parent isolation context 1 10 of the isolation context 1 10 as of an effective time based on the view 120 and the request time. As an example, when the isolation context 1 10 associated with the view 120 is a live (i.e.
  • the effective time may be the request time.
  • the isolation context 1 10 associated with the view 120 is a snapshot isolation context 1 10
  • the effective time may be the later of the creation time of the snapshot isolation context 1 10 and a time associated with the latest successful publication of the snapshot isolation context 1 10 prior to the request time.
  • Publishing and the creation of live and snapshot contexts are some ways that information may travel from one isolation context 1 10 to another. Another way that information may travel from one isolation context 1 10 to another is by returning the result of a computation that is performed in another isolation context 1 10.
  • an isolation context 1 10 may invoke a call operation, which takes a function 136 (see Fig.
  • the count may have had a value of "10,” but calling the function f sets the count to 20.
  • the two variables r and x refer to the same record, but the view presented through the variable r is that of the isolation context C, while the view presented through the variable x is that of the isolation context D. Because of these differing views, r.getCountQ continues to return "10,” but x.getCount() returns "20.” And if something elsewhere caused a computation in the isolation context D to set the count on the same record r to 30, further evaluation in the isolation context C causes x.getCount() to return "30.”
  • the view presented through the variable x is neither that of isolation context C nor isolation context D but a composite "C's view of D's view" associated with isolation context C.
  • isolation context C If, within isolation context C, the count of the record accessed via x is either modified or accessed via a frozen read, subsequent reads of the count in the isolation context C via variable x result in the value established within isolation context C and ignore any intervening modifications made within isolation context D. This distinction remains until isolation context C is successfully published, at which point D's view and C's view of D's view are again identical. In this way, an isolation context may have multiple associated views 120.
  • program code in an isolation context 1 10 may also bind a function to an isolation context (e.g., the prevailing isolation context 1 10 or a different isolation context 1 10), returning a new function 136 which, when invoked, runs the original function 136 in the bound isolation context 1 10 (e.g., by invoking the call operation on the bound context and passing in the original function 136 as a parameter).
  • This binding operation may be used to create a function 136 that is bound to the current isolation context 1 10 and that can be invoked in a child isolation context 1 10 (or another isolation context 1 10 via the call operation).
  • the binding operation may be used to create several new functions by the same function 136 to a number of different contexts 1 10.
  • These multiple isolation contexts 1 10 may be snapshot contexts 1 10 representing the state of the world at various times in the past (e.g., snapshots of a company taken at daily intervals). Alternatively, these multiple isolation contexts 1 10 may be child isolation contexts 1 10 used to explore and select among different alternative approaches to solving a problem. A function bound to the current isolation context 1 10 may be provided to a different context (e.g., by the call operation) to allow the different context to observe and store data in the bound isolation context 1 10.
  • Child isolation contexts 1 10 can obtain references to their parent isolation contexts 1 10, so even if there has been a modification or frozen read in a given child isolation context 1 10, the program code 1 12 of a child isolation context 1 10 may, in accordance with example implementations, invoke the call operation on its parent isolation context 1 10 to determine what the value of a particular location is in the view 120 of the parent isolation context 1 10. For example, to obtain the count field of a record "r" as it would be seen in the parent isolation context 1 10 of the current isolation context 1 10, a program written in Java might call: lsolationContext.current().parent().call(() -> r.getCountQ)
  • a function may be called on the global isolation context 100, for example: lsolationContext.global().call(() -> revenue) to obtain the committed value of a variable. Similarly, such a call may be used to set a default value that is seen by code working in unpublished isolation contexts 1 10. Similar manipulations via the global isolation context 1 10 may be used to manipulate the namespace in a committed manner from within an unpublished isolation context.
  • isolation contexts 1 10 may pass out values without publishing modifications, such values may be accumulated.
  • program code 1 12 executing in an isolation context 1 10 may take a snapshot of the state of a company database periodically (e.g., every day, hour, minute or second) and make each of these snapshots available by binding the snapshots to names within a namespace.
  • the executing program code 1 12 may append the snapshots to a single list, facilitating, for example, identifying the states of the database for the ten days that scored highest on some metric (e.g., the days during which the revenue of the company was highest).
  • program code 1 12 executing in an isolation context 1 10 may run, in parallel child isolation contexts 1 10, a series of potential modifications according to models with different parameters, and place the results (indicating predicted consequences) computed in each child isolation context 1 10 (along with parameters used with the respective models) in a single structure. Then, all of the results may be compared with one another and the child isolation context 1 10 that resulted in the best (and only the best) outcome allowed to publish its results.
  • program code 1 12 executing in an isolation context 1 10 may run, in parallel child isolation contexts 1 10, a series of non-deterministic simulations and collect the results produced within each simulation into a structure, and analyze the structure to determine, based on the results, an action to take.
  • the child isolation contexts 1 10 may be allowed to terminate without ever publishing their modifications to their parent isolation contexts 1 10.
  • program code 1 12 may execute code 1 12 in an arbitrary isolation context 1 10 by explicitly changing the prevailing isolation context 1 10 to be the arbitrary isolation context 1 10.
  • Such a change may be temporary and bounded to particular region of code 1 12 (e.g., a particular function or program block) by ensuring that at the end of the region of code 1 12 the prevailing isolation context is reverted to what it had been before the change.
  • One way to facilitate such a temporary setting of the prevailing isolation context 1 10 is for program code 1 12 written in the Java programming language to use a "try-with-resources" block, creating a resource object that, upon construction remembers the prevailing isolation context 1 10 and sets the prevailing isolation context 1 10 to the arbitrary isolation context, and that, upon invocation of the close() method on the object, sets the prevailing isolation context 1 10 to the remembered isolation context().
  • the "resource acquisition is initialization" (RAN) paradigm can be used similarly.
  • isolation contexts control some, but not all variables or other memory locations available to the program. For such
  • program code 1 12 executing in the temporary prevailing isolation context 1 10 may obtain a value, such as a reference to a data structure, where the reference is associated with a view 120 associated with the temporary prevailing isolation context 1 10, and may store that value in a location that is not under control of the isolation contexts.
  • program code executing in the former prevailing isolation context 1 10 may obtain the value associated with the view 120 associated with the temporary prevailing isolation context 1 10.
  • isolation contexts 1 10 Another way that information may move between isolation contexts 1 10 is when a child isolation context 1 10 publishes its
  • the program may specify, when an isolation context 1 10 is created, that the isolation context 1 10 is "detached," which means that the isolation context 1 10 may not be published. An error may be signaled (e.g., an exception may be thrown) if an attempt is made to publish a detached isolation context 1 10.
  • the system 100 determines whether there are any conflicts that would inhibit (prevent, for example) the publication from proceeding.
  • a conflict may be associated with a property associated with a data structure, where a property may be a location (e.g., a particular field of a particular record or a particular index of a particular array), a relationship managed by the data structure (e.g., the value associated with a particular key in a particular map), or a structural property of the data structure (e.g., the order of elements of a particular list or the number of elements contained in a particular set).
  • a conflict may arise due to the existence of an
  • An unpublished value may be a value asserted to be associated with a view that has not yet been processed in response to a publication of an isolation context associated with the view. More specifically, a conflict may arise when the value associated with such a property is changed within the parent isolation context 1 10 of the isolation context 1 10 requesting publication.
  • a conflict may arise when a new value otherwise becomes visible in the parent isolation context 1 10, when such a value change occurs after a particular effective time and when the child (e.g., publishing) isolation context either modified the same property (where such modification includes receiving a value due to the successful publication by a further child isolation context 1 10 of the publishing isolation context 1 10) or when the publishing isolation context performs a frozen read operation to obtain a value associated with the property.
  • the effective time associated with a property relative to the publishing isolation context 1 10 may be established and updated upon creation of the publishing isolation context 1 10, each time the publishing isolation context 1 10 is successfully published, and upon an explicit indication (e.g., during an attempt to resolve conflicts due to an unsuccessful publication attempt) by program code 1 12 executing in the publishing isolation context 1 10 that any conflicts associated with the property have been resolved.
  • the effective time may be updated when the publishing isolation context 1 10 is established as the prevailing isolation context 1 10.
  • the effective time is also updated the first time a modification to or frozen read of the property is performed following such a creation, successful publication, or explicit indication.
  • an "atomic" process means that the actions that form the process are treated as being indivisible, i.e., the actions are viewed or treated by the isolation contexts as occurring at the same time.
  • a thread cannot see some, but not all, of the published modifications in the parent isolation context, and a modification cannot be made in any isolation context that would change the determination that there are no conflicts, between the determination that there are no conflicts and the making available of the
  • the conflict(s) may be determined proactively so that the conflicts are known at the time the isolation context 1 10 requests publication.
  • the isolation context 1 10 and its parent isolation context 1 10 present the same values for the data structure 106, and the child isolation context 1 10 is no longer considered to have established any current values for the data structure 106.
  • attempts to determine values for properties of the data structure 106 within the child isolation context 1 10 results in determining inherited values from the parent isolation context 1 10.
  • this has the effect that locations that had been "frozen” by modifications or by frozen reads are no longer considered frozen and the inherited values may vary as modifications are made to the parent context; and for a snapshot isolation context 1 10, the "snapshot time" (i.e.
  • a last-common-snapshot for a given isolation context 1 10 may be obtained by calling the appropriate function 136.
  • the last-common-snapshot is a read-only snapshot child of the context's parent as of the last time the isolation context 1 10 was published (or its creation time if it has not been published). This allows an isolation context 1 10 to compare its changes to a data structure 106 with the values of the data structure 106 when the isolation context 1 10 started or was last successfully published.
  • an as-created-snapshot for a given isolation context 1 10 (a snapshot going back to the beginning, or creation, of the snapshot) may be obtained by calling the appropriate function 136.
  • a conflict resolution phase is entered for purposes of taking one or multiple actions to resolve the conflicts.
  • the conflict resolution phase may be handled in a number of different ways, depending on the particular implementation.
  • the program code 1 12 associated with the context 1 10 attempting publication decides how to handle the conflicts and may decide to re-attempt the publication.
  • the program code 1 12 associated with the context 1 10 attempting publication decides how to handle the conflicts and may decide to re-attempt the publication.
  • the program code 1 12 marks each conflict as being resolved so that the conflict does not show up again for a subsequent publication attempt.
  • a given conflict may be resolved by modifying a value of the property associated with the conflict and specifying that such a modification resolves any conflict for that property.
  • the modification may be made by computing and setting a specific value.
  • the modification may be one of the following.
  • the modification may be a resolve-to-current modification in which the value in the publishing context is the one that is used. This is often the correct answer when it can be determined that the value asserted by the child isolation context 1 10 is not dependent on any conflicted value (i.e., if the program code 1 12 of the publishing isolation context 1 10 was executed again, the same result would occur).
  • the modification may be a resolve-to- parent modification in which the current value in the parent isolation context 1 10 is the one that is used.
  • the modification may be a roll-back modification in which the value in the last common snapshot is the one that is used.
  • determining a value for a property associated with a data structure includes determining a lack of an
  • the effective time may be the request time; and identifying the conflict includes
  • the effective time may be the latter of a creation time of the first isolation context and a publication time of the first isolation context prior to the request time; and identifying the conflict includes identifying a conflict associated with the property due to the establishment of a value associated with the property in the parent's view subsequent to the latter of the creation time of the first isolation context and the latest publication time of the first isolation context.
  • a given location may be subsequently modified (after having been marked as resolved) before the publication is reattempted. Moreover, marking a conflict as being resolved may cause the system to disregard any known conflict associated with the location and the isolation context 1 10, but one or multiple subsequent modifications in the parent or child isolation contexts 1 10 may introduce one or multiple new conflicts, which arise when the isolation context 1 10 tries again to publish.
  • program code 1 12 performing conflict resolution may make use of the current (e.g., publishing) isolation context 1 10, the parent isolation context 1 10, and the isolation context's 1 10 last-common-snapshot isolation context 1 10.
  • the program code 1 12 may also make use of a current-at-publish isolation context 1 10 and a parent-at-publish isolation context 1 10, which are read-only snapshots of the current (publishing) context and its parent at the time the publication was attempted (and failed).
  • one or multiple mechanisms may be used to encapsulate the process of creating an isolation context 1 10 as a child of the prevailing isolation context 1 10; running the code in it, and (when the created isolation context 1 10 is publishable) attempting to publish the created isolation context 1 10 at the end.
  • the mechanism may involve keywords, annotations, or other syntactic additions to the source code of the program and the program code may be specified directly.
  • the mechanism may involve calling a function (e.g., one of the functions 136 of Fig. 1A) and passing as an argument an indication of a function to be called within the newly created isolation context 1 10.
  • the program code 1 12 may specify (e.g., by choice of keyword, annotation, or function or by passing in a parameter) the type of child isolation context 1 10 to create (e.g., live or snapshot, detached or not, read-only or not).
  • the program code 1 12 may further specify information used to control behavior upon failure of an attempt to publish. Such information may cause the system to attempt conflict resolution and may control the manner in which conflict resolution is performed. The information may alternatively or in addition direct the system to react to a failure to publish or a failure to resolve conflicts by creating a new child isolation context 1 10 and rerunning the code within it.
  • the information may include one or more termination conditions, which the system may use to determine that further attempts to perform conflict resolution or to rerun the code should not occur.
  • termination conditions may include a given number of attempts having been performed, a given time (e.g., a wall clock time or a time duration) having been passed, a given amount of a resource (e.g., disk space or memory) having been consumed, a value having been asserted by another program thread (e.g., an indication that a solution to a problem has been found by another thread or an indication that a program has gone on to a different phase), a given function returning a true value, a Boolean expression evaluating to a true value, or another termination condition.
  • the conflict resolution may be handled in a number of different ways.
  • the program code 1 12 may decide that resolving the conflict is not worth the effort, and as a result, the isolation context 1 10 does not attempt to republish.
  • the program code 1 12 may be relatively small (online transaction processing (OLTP) code, for example); and as such, the approach may be to simply throw away the associated isolation context 1 10, create a new isolation context 1 10 and execute the program code 1 12 again from the beginning.
  • OLTP online transaction processing
  • the conflict resolution may be handled by a task-based conflict resolver engine 140, which may be associated (as examples) with an object or a function 136.
  • a task-based conflict resolver engine 140 which may be associated (as examples) with an object or a function 136.
  • a particular conflict resolver engine 140 may be designated at the time the program code 1 12 modified the location that was conflicted, essentially specifying, for example, that if the value associated with the location gets a conflict, the particular conflict resolver engine 140 is invoked to resolve the conflict.
  • the program code 1 12 may specify a default resolver (e.g., an object, such as a conflict resolver engine 140, or function) that is associated with a particular location (e.g., a particular field in a particular record), with a particular field (regardless of what record a conflict occurs in), or a type (e.g., applying to conflicts associated with any property of any data structure as long as the property is associated with values of the given type or applying to conflicts associated with any field of any record as long as the record is of the given type).
  • a default resolver e.g., an object, such as a conflict resolver engine 140, or function
  • a type e.g., applying to conflicts associated with any property of any data structure as long as the property is associated with values of the given type or applying to conflicts associated with any field of any record as long as the record is of the given type.
  • This type of conflict resolution may be appropriate when a field is, for example, known to be used as a counter.
  • a resolver may be attached to the field to perform this computation and resolve the conflict by specifying the resulting value.
  • the program code 1 12 may use task-based conflict resolution.
  • task-based conflict resolution the program code 1 12 specifies that some or all of its computation is made up of re-runnable tasks that are executed by the program code 1 12.
  • the system 100 may keep track of the set of locations read and written while working in each task and the dependencies between tasks (e.g., a dependency create where one task reads a location that was written by another task).
  • all tasks that read a conflicting location are selected to rerun, as are all tasks dependent on them (and so on, recursively).
  • conflicted locations may be resolved to the parent value, the current value, or determined value prior to, during, or after selected tasks are re-run. The selected tasks may then re-run in dependency order and the publication may then be retried.
  • the program code 1 12 may provide a function that takes a collection of conflict objects and runs arbitrary code to determine how to resolve the conflicts explicitly.
  • the program code 1 12 may use a combination of the above-described approaches, even for a single publication attempt.
  • the program code 1 12 may use field-based rules to resolve (and eliminate) some conflicts and then invoke one or multiple conflict resolver engines 140 to resolve the remaining conflicts.
  • a technique 400 includes preventing (block 404) a first modification made to a data structure within a first isolation context from being viewed by a second isolation context prior to publication of the first isolation context.
  • the technique 400 includes managing (block 406) the publication of the first isolation context, which includes identifying (block 408) a conflict that prevents the first publication attempt; making (block 412) a second modification within the first isolation context to resolve the conflict; and publishing (block 416) the first isolation context in connection with a subsequent publication attempt, including allowing the second isolation context to view a result of the first and second modifications, as of a time associated with the subsequent publication attempt.
  • a technique 430 includes executing (block 434) machine executable instructions in a processor-based machine in a first isolation context and in a second isolation context.
  • the first isolation context presents an associated first view of a data structure
  • the second isolation context presents and associated second view of the data structure.
  • Executing the machine executable instructions includes inhibiting (preventing, for example) modifications that are made to the first object in the first isolation context from being reflected in the second view.
  • a modification made in the first view "being reflected" in the second view refers to the modification being reproduced or shown in the second view.
  • an attempt is made (block 438) to publish the first isolation context to reflect modifications made to the data structure in the first isolation context in the second view of the data structure. If a determination is made (decision block 442) that one or more conflicts exist, then a decision is made (decision block 444) whether a termination condition has been satisfied (i.e., a decision to abandon the publication has been reached). If not, one or more actions are taken to resolve the conflict(s) pursuant to block 446 and control returns to block 438. When no conflicts remain, the technique 430 includes publishing (block 450) the first isolation context, including causing the second view to reflect modifications made to the data structure within the first isolation context. These modifications include modifications made during action(s) taken to resolve conflict(s), as of a time associated with the last attempted publication attempt. In accordance with example implementations, the conflicts may be resolved using task-based conflict resolution.
  • a technique 500 includes executing (block 504) machine executable instructions in a first isolation context to run a plurality of tasks to modify data according to a first view that is associated with a first isolation context. Pursuant to the technique 500, an attempt is made (block 508) to publish
  • task-based conflict resolution is applied (block 512) to resolve at least one conflict occurring due to the first and second views.
  • Applying the task-based conflict resolution includes executing machine executable instructions to selectively re-run a subset of the tasks of the plurality of tasks, wherein the number of tasks of the subset of tasks is less than the number of tasks of the plurality of tasks.
  • the portions, or units, of the program code 1 12 that are considered to be "tasks" may be specified in a number of different ways.
  • the program code 1 12 may expressly identify one or more portions as being re-runnable tasks.
  • a particular function 136, a task function fn may be passed to a task execution function (i.e., runTask(fn)).
  • runTask(fn) a task execution function
  • the task execution function When the task execution function is called, the task function passed in is executed as a task, thereby becoming a candidate to be re-run if needed during conflict resolution.
  • the same task function may be used to generate several different tasks in the same isolation context 1 10, and because they are executed at different times, even multiple invocations of the same function may have different observed behaviors.
  • the separate tasks associated with the same task functions become separate candidates to be re-run and may be selected or not selected independently.
  • the task execution function may also accept other arguments to be supplied to the task function during its execution. This arguments may be associated with the resulting task and provided when the task function is re-run.
  • the program code 1 12 may specify that during an iterative construct, each separate iteration (or each subset of some partitioning of the iterations) should be executed as a separate task.
  • An iterative construct may include an iterative control structure provided by a
  • Iteration information e.g., the element of the data stream or data structure, an index that allows retrieval of the element, or other information used to control the loop
  • This allows a computation to examine and possibly modify each element of a large data structure, while limiting rework to those elements that are affected by conflicts.
  • functions 136 may be used to specify that a number of tasks may be run in parallel or constrain the tasks to be run in parallel. Moreover, in accordance with example implementations, a function 136 may specify that a given task is to be run as a parallel sibling of the current task.
  • program code 1 12 may specify that a particular function has the property that any execution of the function should be considered to be a separate task.
  • a Java language annotation e.g., @RunAsTask annotation
  • a programming language may provide a keyword that, when used, declares that a following block or expression is to be run as a task.
  • a new keyword to a programming language such as C++, may be used, via amendments to the programming language standard or by modifying a compiler to recognize the keyword.
  • the program code 1 12 may be analyzed, either by the compiler or through the use of a standalone tool for purposes of determining that particular sections of code may beneficially be treated as tasks.
  • a task creator 142 (see Fig. 1 B) of a conflict resolver engine 140 may determine that particular sections of program code 1 12 should be treated as tasks.
  • program code 1 12 may specify that one or more methods of a class are to be treated as separate tasks when invoked by declaring that class to derive or extend, directly or indirectly, from a particular ancestor class or to implement a particular interface.
  • the tasks may be hierarchically arranged. That is, a task P may have n child tasks C lt ... , C n , and it may also have code that runs outside of any child task. These child tasks may have further grandchild tasks and so on, to any depth.
  • a task executor may be used to allow tasks (tasks generated by a task factory, for exampleO to be executed by a fixed or dynamic collection of worker threads.
  • the current task may be obtained by being associated with the current isolation context as a thread-local member (i.e., a field of an object representing the current isolation context having independent values associated with each executing program thread).
  • the new thread the initial value of the current task for the new thread may be the current task from the thread that created the new thread at the time of the new thread's creation.
  • a new task (e.g., an object representing a new task) may be created as a child of the current task and set to be the current task.
  • the current task may revert to the prior current task. This may be accomplished in program code 1 12 written in the Java language by means of a try-with-resources block specified to create an object that remembers the current task prior to executing the new task and resets the current task to be the remembered task in its close method.
  • reverting to the prior current task may be accomplished by means of an object on the stack whose constructor remembers the current task prior to executing the new task and whose destructor resets the current task to the remembered task.
  • the isolation context 1 10 when a new isolation context 1 10 is created, the isolation context 1 10 is not associated with any task for any thread.
  • a new task may be created for each isolation context at isolation context creation and upon successful publication to be the default task for all threads.
  • whether a default task is created is a user-configurable part of the system, either globally or on a per-isolation-context basis. This allows programs which do not use task-based conflict resolution to not incur the overhead of monitoring reads and writes.
  • an isolation context 1 10 set to be the current isolation context 1 10 the current task is now the one associated with that context and the current thread. If current isolation context 1 10 reverts to the prior isolation context 1 10, the current task reverts to the one for the current thread in the prior isolation context 1 10.
  • an isolation context 1 10 is successfully published, all of its tasks are discarded, and its mapping from threads to current tasks is cleared.
  • upon successful publication it is set to map all threads to a single newly-created task.
  • task execution is monitored for purposes of identifying tasks that may be involved with conflicts.
  • the program code 1 12 executes, when the code 1 12 interacts with a location that may be involved in a conflict (e.g., a field of a record, a name in a namespace, a slot of an array, or a mapping of a key in an associative structure such as a map or set), the code 1 12 calls a predefined function 136 of the library 134. When such a call occurs, if there is a current task, the function 136 logs information about the interaction and the location in (or associated with) the task.
  • a location e.g., a field of a record, a name in a namespace, a slot of an array, or a mapping of a key in an associative structure such as a map or set
  • a task monitor 144 (see Fig. 1 B) of a conflict resolver engine 140 may log the location and an indication of whether the operation was a read or a write. In the case of a modification operation, such as an increment, the operation is logged as being both a read and a write. In accordance with some implementations, the log distinguishes between a frozen read, which guarantees that subsequent reads (without an intervening modification) will return the same value and which may induce conflicts, and a free read which does not make that guarantee and which does not induce conflicts. In accordance with example implementations, the task monitor 144 maintains separate logs for reads and writes for each task.
  • the log may include a timestamp that allows the use of the relative order of writes and reads to the same location in different tasks to infer dependencies between the tasks.
  • the task monitor 144 may not log a timestamp, and the task monitor 144 may log a dependency between the current task and the task, which last modified the location may be logged when a read to a location is logged.
  • logged dependencies may be restricted to those between the tasks associated with the same isolation context 1 10.
  • the task monitor 144 may maintain a single log for all tasks relating to a given isolation context 1 10. In such implementations, timestamps may not be used, and relative ordering between reads and writes may be inferred from the ordering of entries in the log. Moreover, in these implementations, the task monitor 144 may record the task that is associated with each operation along with other information about the operation. It is noted that the order in the log is maintained to reflect the actual order of the operations with respect to the managed data.
  • the read logs and write logs are kept as a mapping from numbers identifying records and namespaces to sets of locations (fields in that record or names in that namespace).
  • the task monitor 144 may log conflicting operations other than those on specific locations including, for example, conflicts relating to the size of a map, conflicts relating to the presence or absence of a value in a set, or conflicts relating to the length or ordering of elements in a list.
  • performing the task-based conflict resolution includes a technique 600 of monitoring (block 604) the execution of instructions that run the plurality of tasks for purposes of
  • determining a dependency graph among the tasks determining a dependency graph among the tasks; and identifying (block 608) a subset of the tasks (i.e., a subset of the dependency graph) to be re-run based at least in part on the determined dependency.
  • a subset of the tasks i.e., a subset of the dependency graph
  • Example implementations of ways to construct the dependency graph and identify the subset set of tasks to be re-run are discussed below.
  • the monitoring of the tasks includes observing the instructions, or code, being executed in these tasks as reading and writing locations.
  • a task writing to a location refers to observing program code executing in the task writing to the location; and "a task reading from a location” refers to observing program code executing in the task reading from the location.
  • a graph generator 149 (see Fig. 1 B) of a conflict resolver engine 140 may build up a dependency graph between tasks as the program code 1 12 executes.
  • the graph generator 149 maintains, for each task a dynamically built collection of tasks that are dependent on the task, and uses these dynamically built collections of tasks in the building of the dependency graph when used for conflict resolution.
  • specific write operations within a task may establish value that are dependent on factors other than those obtained via monitored reads within the task where those factors might be expected to be saliently different if the task were re-run.
  • a written value may be dependent on a value read from a clock provided by the system 100, on a value read from a sensor, or on the value of a program variable outside the scope of the task monitoring (e.g., a value outside the managed space).
  • the program code 1 12 may indicate that the current task or a task to be re-run should be re-run unconditionally upon the decision to re-run any tasks.
  • This specification may take the form of a function call, a call to a method on the task about which the assertion applies, or a special form of a write to a field (e.g., a volatileWrite() call).
  • it may be specified as a property of the task object upon creation of the task object or specified as an option to the task invocation.
  • a task selector 146 When an attempt to publish fails due to the presence of conflicts, a task selector 146 (see Fig. 1 B) has a list of conflicts and associated locations at which the conflicts occurred. The task selector 146 resolves these conflicts by selecting tasks to be re-run to recalculate the final values of conflicted locations and/or asserting final values of conflicted locations. The tasks are scheduled to be re-run by a task scheduler 148 (see Fig. 1 B).
  • the task selector 146 identifies as tasks to re-run those tasks that are identified as directly or indirectly dependent on conflicting writes in the parent isolation context and re-executes these tasks after resolving the responsible conflicting write locations to their respective values in the parent isolation context 1 10.
  • the conflict resolver engine 140 resolves the conflicting write location to the final value of that location in the child isolation context, ensuring that upon successful publication of the child isolation context 1 10, the value will be observed as ordered after the committed write in the parent isolation context.
  • the task selector 146 identifies a subset, or subgraph, of the task dependency graph to be re-run; and this subset identifies a subset of tasks of the plurality of tasks that have been executed by the isolation context 1 10.
  • the graph generator 149 may construct a sub-graph of the task dependency graph, identifying tasks and dependency relations from the dependency graph to use in scheduling tasks to rerun, prior to the re-running of any tasks. Such a sub-graph may be considered a static dependency graph.
  • a static dependency graph may differ from an optimal sub-graph to re-run in that it may incorrectly include or exclude tasks or dependency relations.
  • the graph generator 149 may construct the static dependency graph as follows, in accordance with example implementations.
  • the static dependency graph may be constrained to include tasks designated to be unconditionally re-run.
  • the static dependency graph may be constrained to include a first task when it includes a second task and the first task has been identified as explicitly dependent on the second task.
  • the static dependency graph may be constrained to include a task that was determined to have read a value from a conflicting write location, where the value first read by the task was not written by another task associated with the current isolation context.
  • qualifying read operations are restricted to frozen read operations.
  • the static dependency graph may be constrained to include a first task when it includes a second task and the first task was determined to have read a value from a location, where the value was written by the second task.
  • the static dependency graph may further be constrained to include a dependency between the first and second tasks such that the second task is constrained to be re-run after the first task is re-run.
  • the static dependency graph may be constrained to include a first task when it includes a second task and the first task was determined to have written a value to a location, where the value was read by the second task and either the first task was not the last writer of the location in the publishing context prior to the publication attempt or the location is a conflicted write location.
  • the static dependency graph may be constrained to not include both a task and a direct or indirect sub-task of the task. If a task that has tub-tasks is selected to be included in the static dependency graph, its reads and writes may be considered to include all reads and writes of all of its direct and indirect sub-tasks.
  • the static dependency graph may include a constructed final assignment task.
  • the static dependency graph may include dependency relationships between this task and other included tasks constraining the final assignment task to be run after all other identified tasks.
  • the final assignment task may include program code 1 12 to re-assert values for locations whose value at the time of the publish attempt was written by a task not included in the static dependency graph but which was earlier written by a task included in the static dependency graph.
  • the value to be reasserted by the final assignment task is the value at the time of the publish attempt.
  • the observation of such reads or writes may constrain the conflict resolver engine 140 to determine that a static dependency graph cannot be constructed.
  • the conflict resolver engine 140 may be constrained to determine that task-based conflict resolution not be used to resolve conflicts when a conflicting write location was observed to have been read outside of any task, when a value written to a location in the child isolation context was observed to have been read outside of any task, or when program code 1 12 executing outside of any tasks takes an action which would cause the conflict resolver engine 140 to determine that the current task, had there been one, would have been marked to be unconditionally rerun.
  • the static dependency graph may include other tasks and dependencies implying more ordering constraints.
  • the static dependency graph may include all tasks that read or wrote any conflicted location.
  • the static dependency graph my include a first task if it includes a second task when it is determined that the first task read a location previously written by the second task, without respect to whether the value read by the first task was the value written by the second task or by another task, and the first task may be constrained to be re-run after the second task.
  • the static dependency graph my include dependency relationships between two tasks that imply the requirement that each be re-run before the other.
  • the task scheduler 148 may choose to schedule the two tasks to be re-run concurrently with one another.
  • the task monitor 144 may monitor the temporal order of launch and completion of tasks, and the task scheduler 148 may use this information in determining a correct ordering and allowed concurrency during re-run.
  • the program code 1 12 may be able to specify explicitly that certain tasks must, may, or may not be run concurrently with one another.
  • the program code 1 12 may specify explicitly that certain tasks are to be run before or after certain other tasks; or that task A depends on task B such that if task B is re-run, task A is to be rerun. It is noted that task dependency may be independent of order.
  • the system 100 may cause these other threads to pause execution (e.g., immediately prior to the next read or modification operation), and, upon determining that conflicts have been detected, may ensure that any unfinished tasks are selected to be re-run and that threads executing in these tasks are interrupted (e.g., by an exception thrown by the function in which they were paused).
  • the conflict resolver engine 140 resets the values at locations corresponding to the conflicts prior to the re-running of the tasks. In this manner, referring to Fig. 7, in accordance with some
  • task-based conflict resolution connection with task-based conflict resolution includes determining (block 704) a subset of locations to reset to store values shared in common among first and views; resetting (block 708) the identified locations to store values in the locations; and re-running (block 712) the tasks of the subset.
  • the conflict resolver engine 140 may identify and reset (e.g., by modifying) some locations (whether such locations are associated with conflicts preventing publication of the isolation context 1 10 or not), specifying that such resetting is to resolve any conflict associated with the location. This removes such a conflict from the collection of conflicts preventing the isolation context 1 10 from publishing, but the system 100 may detect subsequent conflicts at a reset location between the time its associated conflict is noted as resolved and the time the system 100 attempts to re-publish the isolation context 1 10.
  • the conflict resolver engine 140 may identify locations (whether or not conflicted) noted as having been written by a task selected to be re-run (except for the final assignment task) and may reset such locations to the values they currently have in the parent isolation context. In accordance with an example implementation, the conflict resolver engine 140 may identify conflicted locations not noted as having been written by any task selected to be re-run and noted as having been read by some task selected to be re-run and may reset such location to the values they currently have in the parent isolation context. In accordance with an example implementation the conflict resolver engine 140 may identify conflicted locations not identified to be handled in another manner and may reset such locations to the value they currently have in the publishing isolation context 1 10. Such resetting to the current value may not involve actual modification of the value. In accordance with an example implementation, locations to be written by a final assignment task are not reset prior to task re-run.
  • the tasks may be re-run after the resetting of values.
  • the tasks may be scheduled to re-run in any order that is consistent with identified ordering
  • the task scheduler 148 may schedule tasks to re-run based in part on an observed relative temporal order of tasks during the initial run or a prior re-run. In accordance with example embodiments, the task scheduler 148 may schedule tasks to be re-run concurrently based on observed concurrency in the initial run, on inferences from the task dependency graph, on allowed concurrency information specified by the program code 1 12, or on other grounds.
  • the task scheduler 148 may schedule a final assignment task to be run, after all other tasks have been re-run. This final task modifies remembered locations to remembered associated values, specifying that such modifications are to resolve any conflict associated with the location.
  • the final assignment task may refrain from modifying a location if it is noted that it has not been previously written by any re-run task. In such example implementations, the final assignment task may instruct the system 100 to consider any conflict at such a location to be resolved.
  • a final assignment task may be run prior to the completion of the re-run of all other tasks selected to be re-run. In accordance with some example implementations, there may be more than one final assignment task.
  • the graph generator 149 may construct the dependency graph, and task scheduler 148 may traverse, or walk, the dependency graph as tasks are being run, the task scheduler 148 deciding on later tasks to run based on behavior observed by the task monitor 144 during the re-run of earlier tasks. More specifically, the graph generator 149 may construct, based on task observations, a dependency graph, in a manner similar to that described above, but including information about observed temporal ordering between tasks. As before, some tasks may have been indicated as unconditionally to be re-run.
  • the task scheduler 148 may iteratively process the graph, considering, processing, and removing tasks associated with root nodes from the graph. Such root nodes are known to not be dependent on any tasks remaining in the graph.
  • the task re-run may be considered complete when every node has been processed and re-run from the task. Root nodes may be considered singly or multiply.
  • the system 100 schedules the task to be re-run.
  • the system 100 schedules the task to be re-run.
  • the task scheduler 148 may examine the set of values written by the task to determine whether the conflict resolver engine 140 has sufficient information to reproduce the effects of re-running the task without actually re-running the task. If the task scheduler 148 determines that the conflict resolver engine 140 has sufficient information, then the conflict resolver engine 140 modifies locations written by the task to simulate re-run. In such an example, if a location is a conflicted location, the conflict resolver engine 140 may note the associated conflict as resolved.
  • the conflict resolver engine 140 schedules the task to be re-run.
  • writes to locations for which the task was noted to be the last writer may be simulated by allowing them to retain their current values.
  • writes to locations for which the task was not noted to be the last writer and for which the last value written by the task was logged may be simulated by writing the logged value to the location.
  • writes to locations noted to have been modified during re-run may be simulated by allowing them to retain their current values following a determination that the there are no dependencies that would have precluded the task under consideration from having been scheduled before or concurrently with the task that last wrote to the location.
  • the conflict resolver engine 140 may note the conflict associated with the location as being resolved to the current value in the parent isolation context 1 10. In accordance with an example implementation, during task re-run, the first time a conflicted location is written, the conflict resolver engine 140 may note the conflict associated with the location as being resolved to the value being written. In accordance with an example implementation, the graph generator 149 may modify the dependency graph based on writes to locations observed during task re-run. In accordance with an example implementation, the graph generator 149 may remove tasks that have completed re-run from the graph.
  • the cycle of attempting publication, and resolving conflicts may be performed a bounded number of times or until some other termination criterion is reached (e.g., a particular amount of time has elapsed, a particular wall-clock time has been reached, a termination flag has been set from outside, or a termination function returns a particular value). If a termination criterion is reached before the publish is successful, the request to resolve conflicts is declared to have failed. In this case, a new isolation context may be established and the entire computation retried (perhaps with conflict resolution), repeating based on a (perhaps different) set of termination criteria, before declaring that the attempt to publish has failed.
  • some other termination criterion e.g., a particular amount of time has elapsed, a particular wall-clock time has been reached, a termination flag has been set from outside, or a termination function returns a particular value.
  • a system 800 to perform task-based conflict resolution includes a processor-based conflict resolver engine 820 (such as the conflict resolver engine 140, for example) that monitors the execution of a set 808 of tasks 810 associated with an isolation context for purposes of constructing a dependency graph 822, which describes the interdependencies of the tasks 810.
  • the conflict resolver 820 resolves conflicts 821 (represented by data stored in a memory 823, for example) that prevent publication of the isolation context by selecting a subset 828 of the tasks 810.
  • the subset 826 contains a fewer number of tasks 810 than the set 808; and the tasks 810 of the subset 826 are re-run before publication is re-attempted.
  • properties of a data structure may be represented by multiple simultaneous value (MSV) objects.
  • MSV simultaneous value
  • an MSV object represents a property of a unit of data and may concurrently, or simultaneously, have multiple, alternative values.
  • a particular logical location such as a field of a record or a slot in an array, may be represented by an MSV object, such that the field or slot has different, alternative values, when seen through different views.
  • a given isolation context may be associated with one or multiple views that may be used when accessing one or multiple MSV objects. Moreover, multiple isolation contexts may be associated with multiple views of a given MSV object. The values associated with different views in a given MSV object may be isolated from each other. In this manner, as part of this isolation, a request to determine a value for the MSV object in a given view may result in the return of one of the MSV object's alternative values; and assigning the value of the MSV object in a given view may not affect the value of the MSV object in another view.
  • the system 100 may use various data structures 150 to manage the MSV objects.
  • a given data structurel 50 may be an object represented by a C++ class.
  • the data structures 150 may include isolation context objects 151 .
  • Each isolation context object 151 corresponds to one of the isolation contexts 1 10. It is noted that in the following discussion, "isolation context object 151 " and “isolation context 1 10" may be used interchangeably.
  • Conflicts 152 that prevent publication of the isolation context 1 10 may be installed on a corresponding isolation context state 157.
  • the data structures 150 may include view objects 153.
  • Each view object 153 may correspond to one of the views 120. It is noted that in the following discussion, “view object 153" and “view 120” may be used interchangeably.
  • the data structures 150 may include one or multiple conflict generators 160.
  • a conflict generator 160 may construct conflicts referring to a particular location.
  • Each subtype of conflict may have its own generator subtype, in accordance with example implementations.
  • the data structures 150 may include context states 157.
  • a context state 157 represents the current set of conflicts and "contingent conflicts" for an associated isolation context 1 10, as well as a count or event time, representing the last time the isolation context associated with the conflict successfully published.
  • the data structures 150 may include an event counter 164, which may represent a monotonically-increasing set of values that are incremented by particular events.
  • the event counter 164 may represent a global, shared event count, which is atomically incremented.
  • the constant representing the greatest possible value of an event counter may be used to represent the most recent event count.
  • values of the event counter 164 may be associated with snapshot creation times, in accordance with example implementations.
  • Values read from the event counter 164 may be considered to be timestamps denoting points in type of the execution of system 100, and the value stored in the event counter 164 may be considered to be the current time (or timestamp) of the system 100 with respect to the context and MSV object management resources 130. It is noted that save that their values monotonically increase over time, meaning that timestamps may be compared with one another, these timestamps have no necessary relationship with any other notion of time (e.g. wall-clock time or time since the start of an operating system or process in system 100). It is also noted that different reads of the event counter 164 may correctly read the same timestamp value, but consecutive reads of the event counter 164 may never read an earlier timestamp after reading a later timestamp.
  • the data structures 150 may represent value chains 156, value objects 158 and MSV objects 166.
  • Each value chain 156 may be associated with a particular view object 153 (representing a view 120) that may be use when accessing an MSV object 166; and the value chain 156 represents a history of value objects 158 (for a particular MSV object 166 noted in the given view 120), starting from the most recently asserted and extending in a time ordered sequence back in time.
  • each value object 158 may contain a value (e.g., a primitive value or a reference to a data structure) for the MSV object 166 in the view 120 associated with the value chain 156 valid as of a particular timestamp, a timestamp representing that effective time and a link to a prior value object 158 (if any) for the same view 120.
  • a value e.g., a primitive value or a reference to a data structure
  • the data structures 150 may also include view relative pointers 154.
  • a view relative pointer 154 represents a reference, or pointer, to a structured object (a record, array or a map, as examples) and also specifies that read and write accesses to the structured object are to be seen through the perspective of an associated view 120.
  • the view relative pointer 154 may contain a pointer to an object and a pointer to a view object 153.
  • the value stored in a value object 158 may be a view relative pointer 154.
  • the view object 154 contained within a view relative pointer 154 stored in a value object 158 on a value chain 156 may be constrained to represent a view 120 associated with the same isolation context 1 10 that is associated with the value chain's 156 view 120.
  • a given MSV object 166 represents alternative values for a property of a data structure 106, as seen through different views 120.
  • the data structures 150 may also include MSV states 162.
  • Each MSV state 162 represents the current set of one or multiple value chains 156, which may be associated with an associated MSV object 166.
  • a given MSV object 166 may constrain the values associated with it to all be of a given type (e.g., all integers, all strings, or all lists of employee records). Alternatively, a given MSV object 166 may permit values of different types to be associated with it.
  • the data structures 150 may also include MSV states 162.
  • Each MSV state 162 may represent a set of value chains 156 associated with an MSV object 166.
  • the data structures 150 may also include serializing tasks 170.
  • the tasks 170 may include write or publication tasks, used to effect either a modification of an MSV object 166 (for a write task) or an attempt to publish an isolation context object 151 (for a publication task). These tasks 170 may be used to ensure that operations that are supposed to be atomic are, in fact, atomic, as simultaneous execution might cause incorrect behavior. Rather than blocking, the serialization of the tasks 170 may allow threads to cooperate with each other, as further described herein.
  • the system 100 may use C++ atomic classes and associated functions, such as a C++ compare_exchange operation (also called a "compare and swap" operation or "CAS" operation).
  • C++ compare_exchange operation also called a "compare and swap” operation or "CAS” operation.
  • versioned pointers may be used, which encapsulate a value along with a version number, which is incremented every time the value is modified.
  • the version number may be represented by the use of sequences of bits (e.g., high-order bits) within the versioned pointer value.
  • Pointers, including versioned pointers may encapsulate flags representing Boolean values (e.g., represented as bits within the pointer value).
  • Operations performed on a data structure may be considered 150 to be logically atomic even though the performance of the operations takes measurable time, during which other operations involving the same data structure 150 may be initiated in another thread.
  • a logical atomic operation that is associated with a finite processing time may be deemed to be correct when the result returned by the operation is one that would have been correct had the operation been instantaneous and executed at some arbitrary time in between the time the operation started and the time the operation finished and, further, when any modifications the made in the course of performing the operation may be logically ordered as a group with respect to those of other logically atomic operations. It is noted that this standard of correctness may be sufficient to guarantee that no sequence of operations performed in other threads is able to determine that the operation was not performed atomically.
  • each modifiable location may be associated with an MSV object 166; each MSV object 166 may have an associated mapping from view objects 153 to value chains 156 (e.g., in an
  • each value chain 156 may have an associated timestamped list of value objects 158.
  • the mapping may distinguish between open view objects 153 and closed view objects 153. In accordance with example implementations, the mapping may distinguish between open and closed value chains 156 or views 120 associated with such view objects 153.
  • An open view object 153 is a view object 153 whose associated value chain 156 was last modified (e.g., by the addition of a value object 158) subsequent to the last time that the isolation context 1 10 associated with the view object 153 (representing a view 120) was published prior to the creation of the MSV state 162 (although the isolation context 1 10 may have been subsequently published).
  • a closed view object 153 is a view object 153 whose associated isolation context 1 10 is known to have been published since the last time its associated value chain 156 was modified.
  • open view objects 153 associated with the MSV object 166 whose isolation contexts 1 10 (as represented by isolation context objects 151 ) have been published may be closed before the read or modification occurs. More specifically, in accordance with example implementations, an open view object 153 may be closed by a close_published_views process. In this process, a new value object 158 may be added to the value chain 156 associated with view object 153 that is the parent of the view object being closed (e.g., as the head of the value chain 156).
  • the value in the new value object 158 may be the value from the most recent (e.g., first) value object 158 of the value chain 156 associated with the open view object 153, and the timestamp of the new value object 158 may be a timestamp associated with the successful publication of the view object's 153 isolation context 1 10.
  • any view objects 153 to be closed by the close published views operations e.g., if there are any view objects 153 whose associated isolation contexts 1 10 have been successfully published since the last time a value object 158 was added to the respective value chain 156
  • the close published views operation may be retried until a new MSV state 162 is successfully installed or a determination is made that there are no open view objects 158 in the MSV object's 166 current MSV state 162. In this manner, the value chains 156 are shared, and the threads cooperate with each other, as further described herein. All subsequent operations are done relative to the resulting MSV state 162, even if the MSV's 166 state 162 is changed again before the operation finishes.
  • read and modify operations provided by MSV objects 166 may take as a parameter a view object 153 representing the view 120 to use in performing the operation.
  • the view object 153 may be that associated with the view relative reference 154.
  • the operations may additionally take as a parameter an isolation context object 151 representing a prevailing isolation context 1 10.
  • the isolation context object 151 may be requested to identify a shadow view object 153 for the provided view object 153, as described further below, and that shadow view object 153 may be used in place of the provided view object 153 in performing the operation.
  • Read operations find the value chain 156, if any, associated with the view object 153 in the MSV state 162 and return the value from the value object 158 in the value chain 156 where the timestamp of the value object 158 is no later than a specified timestamp and is the latest such timestamp among the value objects 158.
  • a "most recent" timestamp may be specified for the read operation, which indicates that the value object 158 with the most recent timestamp is to be used. If the timestamp is the "most recent" timestamp, the search for associated value chains 156 may be restricted to the open value chains 156 in the MSV state 162. If there is no associated value chain 156 or if there is no value object 158 on the value chain 156 prior to the timestamp, then the process is repeated, with the view object 158 being replaced by that view object's 158 parent view object (if any).
  • the isolation context 1 10 associated with the replaced view object 153 is a snapshot isolation context 1 10
  • the timestamp is replaced by the snapshot time for the isolation context 1 10 (e.g., the timestamp associated with the last successful publication of the isolation context 1 10 or the creation of the isolation context). This may continue until a value object 158 is found and its value is returned or until a view object 153 is determined to not have a parent. In this case, a default value of the appropriate type may be returned.
  • a lazy publication process may be used to effect propagating changes made to an MSV object 166 in a given view 120 due to the publishing of an isolation context 1 10 associated with the view.
  • operations that are directed to effecting the publication do not occur until a subsequent operation directed to the MSV object.
  • a technique 700 includes providing (block 704) an object that has a plurality of alternative values; associating (block 708) a plurality of views with the plurality of alternative values; and associating (block 708) a plurality of computational contexts with the views.
  • the views are isolated (block 712) such that a request to determine a value in a given view results in a value of the plurality alternative values being returned, and a first value associated with a first view is independent from association of a second value a second view.
  • the technique 700 includes publishing (block 716) a computational context of the plurality of computational contexts to allow a value of the plurality of alternative values associated with a view of the plurality of views associated with the published context to be read in at least one other of the views; and in response to an operation directed to the object after the publishing, processing an effect of the publishing on the at least one other views, pursuant to block 720.
  • operations that modify values associated with views 120 an MSV object 166 may be serialized by means of write tasks associated with the MSV object 166. These write tasks are represented by corresponding write task objects 170 (also called "write tasks 170" herein). For example, such operations may include writing a value, incrementing or otherwise modifying a value, or resolving a conflict associated with a value. To effect this serialization, the MSV object 166 may have the capability to be associated with a single pending write task 170.
  • a thread of the system 100 may create a write task object 170 to store the data required to perform the requested modification and to discover any conflicts that result from the modification. This data may be stored in the write task object 170, such that values determined by one thread performing the write task for the MSV object 166 may be visible to other threads performing the write task for the same MSV object 166.
  • the write task object 170 may also contain information about steps in the process that have been completed by some thread. This may allow a given thread performing the write task to skip steps that have already been completed by another thread.
  • the thread performing the modification may attempt to change the pending write task associated with the MSV object 166 from a null value to the newly created write task object 170 (e.g., by means of a CAS operation). If this fails, it may indicate that another thread is processing a different write task directed to the MSV object 166 (i.e., may indicate that there is an ongoing modification of the MSV object 166 in association with another view object 153), and the present thread may perform that write task before repeating the attempt to install the newly created write task 170.
  • multiple threads may cooperate in processing a modification and a thread in one operating system process may complete a modification begun by a thread in a different operating system process that died before the modification was completed.
  • the thread may process the newly installed write task.
  • a thread of the system 100 may process a write task directed to an MSV object 166 as follows.
  • the "current write” refers equivalently to the write task object 170 being invoked, the performance of the steps below, which may take place concurrently in multiple threads, and the intended modification.
  • the write task may, in general, contain the following five steps:
  • the write task obtains the current MSV state 162 associated with the MSV object 166. It is noted that this MSV state 162 may be different from the one obtained as a result of calling a process called "close_published_values()," which is described further below. Due to the serialization of tasks, no further changes may be made by other tasks until the current write task finishes.
  • isolation contexts 1 10 that are associated with value chains 156 in the MSV state 162 are enumerated. For each isolation context 1 10 for which it is determined that it is possible for the current write to introduce a conflict or contingent conflict for the isolation context 1 10, the thread installs the current write task object 170 in the corresponding isolation context object 151 as a pre- publication task. This ensures that the isolation context 1 10 cannot be published until this write task 170 completes. Therefore publishing of the isolation context
  • a contingent conflict refers to an indication that should one isolation context 1 10 successfully publish its changes, doing so will, as part of the atomic publishing process, induce a conflict in another isolation context 1 10.
  • the thread processing the write task determines the value to be associated as the result of the modification, and the write task may create a value object 158 and add the created value object 158 to the correct value chain 156.
  • the thread processing the write task next identifies any conflicts 152 or contingent conflicts for each value chain 156 in the MSV state 162 and adds these conflicts 152 and contingent conflicts to the associated isolation contexts 1 10. For example, the system 100 may determine whether the current write implies a conflict for the value chain's associated isolation context 1 10. For example, when the value chain 156 is open, the associated isolation context 1 10 is a publishable isolation context, and the associated view 120 is a descendent of the view being modified, the thread may determine that a conflict should be added to the isolation context object 151 associated with the value chain 156.
  • the isolation context 1 10 associated with the view 120 being modified is a publishable snapshot
  • the view 120 associated with the value chain 156 under consideration is an ancestor of the view being modified
  • the timestamp of the most recent value object 158 on the value chain 156 under consideration is more recent than the snapshot time of the snapshot
  • the thread processing the write task may determine that a conflict should be added to the isolation context object 151 associated with view 120 being modified.
  • the thread processing the write task may determine that contingent conflicts should be added to one or both of the isolation contexts objects 151 associated with the views 120 such that when one of the corresponding isolation contexts 1 10 publishes the publication adds a conflict at the MSV object 166 to the other isolation context object 151 .
  • the thread processing the write task may remove the current write task as a pre-publication task from the isolation context objects 151 that it was added to, thereby allowing any publication(s) to proceed.
  • this step may be performed as part of the prior step following the determination of any conflicts on all of the isolation context's 1 10 associated value chains 156 or following the addition of a conflict to the isolation context.
  • the thread may attempt to remove the write task 170 as the pending write task 170 for the object, e.g., by using a CAS operation to replace the current write task 170 with a null value.
  • the thread may make a single attempt, as failure of the CAS operation may indicate that another thread was previously successful in this removal.
  • a thread may perform a check to determine whether there is already a value object 158 established on an open value chain 156 associated with the operation's view 120 since the last time the associated isolation context 1 10 was published (or at all if the isolation context 1 10 has yet to publish). If such a value object 158 exists, the value associated with the value object 158 may be returned. Otherwise, the thread may initiate a modify operation, where the value to be asserted by the modify operation may be the current value associated with the view, and this value may be returned by the frozen read operation.
  • the modify operation may indicate in the value chain 156 that the value added is as the result of a frozen read and therefore, is not propagated to a value chain 156 associated with the view's 120 parent view 120 following successful publishing of the isolation context 1 10.
  • the publication process may be serialized with respect to other publication tasks or other write tasks installed as pre-publication tasks (as described above) by concurrently executing threads of the system 100: a thread may install a publication task in the isolation context object 151 associated with the isolation context 1 10 to be published after cooperating in finishing any currently installed publication task or write tasks.
  • a conflict or contingent conflict may not be added to an isolation context object 151 by a write task 170 unless that write task 170 has been installed as a pre-publication task 170 on the isolation context object 151 ; and a write task 170 may not be installed as a pre-publication task 170 on an isolation context object 151 that has an installed publication task 170 before the publication task is finished with the cooperation of the thread attempting to install the write task 170.
  • no further conflicts may be added until after the publication attempt associated with the publication task finishes.
  • any attempted publication of the associated isolation context 1 10 fails, and the conflicts 152 are noted and may be addressed, as further described herein. Otherwise, if a given isolation context 1 10 attempts to publish and has no actual conflicts 152 installed, the isolation context 1 10 is allowed to publish; any contingent conflict(s) 152 are identified and added as actual conflict(s) 152 to the isolation context object(s) 151 associated with the contingent conflict(s) 152; and the isolation context object 151 being published is updated to be associated with a new isolation context state 157 having no (actual or contingent conflicts), an updated last publication time, and its prior isolation context state 157 as a prior state.
  • the determination that there are no actual conflicts 152 and the updating of the isolation context state 157 may be constrained to constitute a single logically atomic operation.
  • the isolation context object 151 may include one or multiple of the following features.
  • the isolation context object 151 may include an immutable reference to a parent isolation context object 151 .
  • the isolation context object 151 may include an atomically-updated reference to a current isolation context state 157. This reference may be atomically changed to refer to a new isolation context state 157 whenever the associated isolation context 1 10 publishes or when a conflict or contingent conflict is created for the context 1 10.
  • the isolation context state 157 may contain one or multiple conflicts 152. More specifically, in accordance with example implementations, the isolation context state 157 may include a list of conflicts representing actual conflicts; a list of conflicts representing contingent conflicts; a timestamp representing the last publication time; and a reference to the prior state of the associated isolation context 1 10 as of the last successful publication of the isolation context 1 10 (if any).
  • an associated isolation context state 157 may be created.
  • the timestamp associated with a created isolation context state 157 may be the result of atomically implementing the event counter 164.
  • the lists of conflicts and contingent conflicts may be linked lists such that creating a new isolation context state 157 based on an existing isolation context state 157 differing only in the addition of or removal of conflicts 152 on one of the conflict lists may involve isolation context states 157 whose lists share a state in a common suffix.
  • the timestamps of the isolation context states 157 associated with an isolation context object 151 represent the creation time of the isolation context object 151 and its successful publications. These timestamps may be called “stable times" for the isolation context object 151 and its associated isolation context 1 10. The latest of these stable times (e.g., the timestamp of the isolation context state 157 directly associated with the isolation context object 151 ) may be called the "last stable time" for the isolation context object 151 .
  • the isolation context object 151 may have an atomically-updated reference to a pre-publication task that may be a write task or a publication task that is to be completed before publication of the associated isolation context 1 10 may be attempted.
  • the pre-publication task may be a single publication task or a collection of write tasks, all of which are to be completed before publication of the associated isolation context 1 10 may be attempted.
  • the completion of the write tasks from the collection of write tasks may include basing whether to perform the write task on an indication of whether the task has been completed (e.g., by another thread).
  • the isolation context object 151 may have an associated map, also called a "shadow map," which maps view objects 153 to other view objects 153.
  • the shadow map may be a lock-free map.
  • the shadow map may be a lock-free cuckoo map.
  • the isolation context object 151 may have an associated immutable enumerated value that represents whether the associated isolation context 1 10 is a live or snapshot isolation context 1 10.
  • the isolation context object 151 may have an immutable, enumerated value that represents the associated isolation context's modification type, which may be a publishable isolation context 1 10, a detached isolation context 1 10 or a readonly isolation context 1 10.
  • a publishable isolation context 1 10 refers to an isolation context 1 10 that may be published.
  • a detached isolation context 1 10 refers to an isolation context 1 10 that is constrained to be one that is not published, but values in the detached isolation context 1 10 may be modified.
  • a read-only isolation context 1 10 refers to an isolation context 1 10 that is constrained to not be published, and values in the read-only context 1 10 are constrained to not be modified.
  • the system 100 may be constrained to not create a publishable isolation context 1 10 whose parent is a read-only isolation context 1 10, as publishing modifications from the child isolation context 1 10 would constitute an impermissible modification to the read-only parent isolation context 1 10.
  • the data structures 150 include a global isolation context object 156 representing the global isolation context 1 10 and having no parent isolation context object 156.
  • the isolation context object 151 may also be associated with one or multiple of the following operations.
  • the isolation context object 151 may provide a shadow(view object) operation to identify a view object 153 associated with the isolation context object 151 and related to the specified view object 153. If the shadow(view object) operation determines that the isolation context object 151 is associated with the view object 153, the view object 153 is returned as the value of the shadowQ operation. Otherwise, in accordance with example implementations, the shadow(view object) operation involves checking the shadow map associated with the isolation context object 151 . If there is no entry in the shadow map corresponding to the view object 153, the shadow(view object) operation may include invoking the shadow(view object) operation on the parent isolation context object 151 .
  • the shadow map contains a view object 153 corresponding to the resulting view object 153
  • the corresponding view object may be returned. Otherwise, the shadow(view object) operation may create a new view object 153, as a child of the parent's shadow view object 153.
  • the shadow(view object) operation may associate the new view object 153 with the specified view object 153 and with the parent's shadow view object 153. In this way, a hierarchy of shadow view objects 153 may be created and efficiently retrieved.
  • the isolation context object 151 may provide a new_child(view type, modification type, timestamp) operation to create a child isolation context object 151 of the isolation context object 151 , where the timestamp associated with the child's isolation context state 157 is the specified timestamp or, if the timestamp is omitted or "most recent” is specified, the result of incrementing the event counter 164, and where the child has the specified view type (e.g., either "live” or "snapshot") and the specified modification type (e.g., "publishable”, "detached", or "read-only”).
  • the specified view type e.g., either "live” or "snapshot
  • the specified modification type e.g., "publishable”, "detached", or "read-only”
  • the isolation context object 151 may provide a publishQ operation to install and run a publish task, as described below, after first collaborating in finishing any publish or write tasks already installed in the context's pre-publication task.
  • the isolation context object 151 may provide an add_conflict(conflict) operation to first check whether the conflict 152 has been marked as being
  • the operation atomically replaces the current isolation context state 157 with a new isolation context state 157 identical to the current save that the conflict 152 is prepended to the conflict list.
  • the replacement may be performed atomically by a CAS operation, looping until an attempt succeeds and creating a new isolation context state 157 each time.
  • the operation attempts to install the conflict 152 in a location (e.g., a value chain 156) associated with the conflict 152 to assert that there is known to be a conflict 152 at that location.
  • the attempt to install may be made by a single invocation of a CAS operation attempting to replace a null value with the present conflict 152. Failure in this attempt implies that another conflict 152 has been installed there (e.g., by another thread), and accordingly, the present conflict 152 is marked as being resolved.
  • the isolation context object 151 may provide an
  • add_contingent_conflict(conflict) operation which atomically replaces the current isolation context state 157 with a new isolation context state 157 identical to the current save that the conflict 152 is prepended to the contingent conflict list.
  • the replacement may be performed atomically by a CAS operation, looping until an attempt succeeds and creating a new isolation context state 157 each time.
  • the isolation context object 151 may provide a
  • conflict_resolved(conflict) operation to remove a conflict 152 from preventing publication of the isolation context 1 10 associated with the isolation context object 152. If the specified conflict 152 is the head of the current isolation context state's 157 conflict list, the operation replaces the associated isolation context state 157 with a new isolation context state 157 equivalent to the old one, save that all initial conflicts 152 in the conflict list that are marked as resolved are removed. In this way, in accordance with example implementations, resolved conflicts 152 may remain on the conflict list, but after all conflicts 152 have been marked as being resolved, the conflict list in the isolation context state 157 is empty.
  • the isolation context object 151 may provide a
  • the isolation context object 151 may provide a
  • publication_after(timestamp) operation traverses the list of the isolation context states 157, starting from the current isolation context state 157 and continuing by following the prior state pointer; and returns the earliest timestamp associated with a state 157, such that the timestamp of the state 157 is after the specified one. This operation may also indicate whether there is no such timestamp.
  • the isolation context object 151 may provide a
  • publication_before(timestamp) operation traverses the list of the isolation context states 157, which may be similar to the publication_after(timestamp) operation. However, the publication_before(timestamp) operation returns the latest publication timestamp before the given one, or a zero timestamp (or a similar timestamp that is considered to be before any other timestamp), if the given timestamp is before the associated isolation context's creation time.
  • the contingent conflict list of the isolation context state 157 may have nodes, where each node indicates a contingent conflict and has an associated "handled" flag.
  • the contingent conflict associated with a node may only be added as a conflict to its associated isolation context object 151 following a determination that the flag is not set; and the flag may be atomically set following the successful addition of the conflict 152.
  • the view object 153 may have one or multiple of the following properties.
  • the view object 153 may contain an unchangeable, or immutable, reference to the isolation context 1 10 (as represented by object 151 ) that created it; and the view object 153 may contain an immutable reference to its parent view object 153 (which may or may not be associated with the same isolation context 1 10).
  • the view object 153 may have an ancestor cache, which may be a map from view objects 153 to Boolean values, where an association in the map indicates a determination that a given other view object 153 has been determined to be or to not be an ancestor of the current view object 153.
  • a true Boolean value, a false Boolean value and no entry represent whether the ancestry has been determined to be an ancestor, has been determined to not be an ancestor or is to be determined, respectively.
  • the ancestor cache is not constructed until a determination is required and is atomically installed.
  • the ancestor cache is a lock free map (e.g., a lock free cuckoo map).
  • the data structures 150 include a top-level view object 153, which is associated with the global isolation context object 151 and which has no parent view object 153.
  • the view object 153 may provide an operation that is directed to retrieving a reference to the associated isolation context object 151 .
  • the view object 153 may provide a has_ancestor(view object) operation to determine whether a given view object 153 is an ancestor of the view object 153.
  • the operation includes first determining whether the specified view object 153 is the same as the view object 153, the same as the parent of the view object 153, or the top-level view object 153. In any of these cases, the operation may return a true Boolean value. Otherwise, if the view object 153 is the top-level view object 153, then the operation may return a false Boolean value.
  • the operation creates one and atomically associates it with the view object 153.
  • the ancestor cache may be examined to determine whether the answer is known. If a Boolean value is found to be associated in the ancestor cache with the specified view object 153, the Boolean value may be returned. Otherwise, the ancestry of the view object 153 may be walked, or traversed, starting with its parent view object 153 and continuing through successive parent view objects 153 until the specified view object 153 or the top-level view object 153 is encountered. The value of the operation depends on the view object 153 encountered during the traversal of the ancestry: a Boolean true value if the specified view object 153 is encountered and a Boolean false value if the top-level view object 153 is encountered. The value of the operation may be associated with the specified view object 153 in the ancestor cache and returned.
  • the conflict 152 may have one or multiple of the following properties.
  • a conflict 152 may contain an atomically-updated flag to indicate, or represent, whether or not the conflict 152 has been resolved.
  • a conflict 152 may contain an immutable reference to an atomically- updated reference to a conflict 152, which is the location in which this conflict is installed.
  • the conflict 152 may have one or multiple subclasses holding information identifying different types of locations at which conflicts may occur.
  • such subclasses may include: 1 .) a field conflict, which holds a reference to a record object and an indication of a field within that record object; 2.) a bound name conflict, which holds a reference to a
  • the MSV object 166 may have one or multiple of the following properties.
  • An MSV object 166 may contain an atomically-updated reference to its current state 162.
  • An MSV object 166 may contain an atomically-updated reference to a pending write task, which is a write task is in progress and must be completed before another modification can be performed.
  • An MSV object 166 may contain an immutable reference to a conflict generator 160.
  • the type of the conflict generator 160 referred to may depend on the type of location the MSV object 166 represents (e.g., field of record or slot of array).
  • Conflict generators 160 associated with different types of location may generate different instances of different subtypes of conflict 152.
  • the MSV object 166 may be an instance of a template (or parameterized or generic) class, where the template parameter may indicate the type of data contained (e.g., the values contained in value objects 158 and provided to and returned by operations).
  • This template parameter may also be used by the MSV state 162, value objects 158, and value chains 156, which may be nested within the class implementing the MSV object 166.
  • Such an arrangement may allow different representations of value objects 158 holding different types of values, which may permit more efficient representation when values are of a primitive type (e.g., numbers, Boolean values, or characters).
  • the MSV object 166 may provide one or multiple of the following operations.
  • the MSV object 166 may provide a read(view, timestamp) operation to determine and return the value of the MSV object 166 for a specified view object 153 as of (i.e., no later than) the given timestamp, which, if omitted defaults to the "most recent" timestamp, indicating that the most recent value should be returned.
  • a process of reading from an MSV object 166 includes invoking the current_value operation on the MSV state 162 associated with the MSV object 166.
  • the MSV object 166 may provide a read_frozen(view) operation to determine and return the current (e.g., most recent) value of the MSV object 166 for the given view object 153.
  • the read_frozen(view operation) ensures that
  • the MSV object 166 may provide a has_value(view, timestamp) operation to indicate (e.g., via returning a Boolean value) if a read operation with the same parameters would return a value discovered in a value object 158.
  • the MSV object 166 may provide a modify(view, op, resolve?, argument) operation to modify the value in the given view object 153, according to the given operation applied to the current value object 158 and (when applicable) the given argument.
  • the "?” suffix denotes a Boolean value.
  • the operations may include one or multiple of the following: an operation to set the value to the argument; an operation to add, subtract, multiply, or divide the current value by the argument; an operation to clear the value (e.g., set the value to a default value); and an operation set the value to a value present in the MSV object 166, where the present value may be one of the current value in the given view 120; the value in the parent view 120 of the given view object 153, the last stable value (e.g., the value as of the last stable time for the associated isolation context 1 10); and the current value, with an indication that this should be noted as implementing a frozen read.
  • the "resolve?" argument controls whether this modification should be considered to resolve any conflict 152 for this view object 153 in this MSV object 166.
  • the MSV object 166 may provide a write(view, resolving, new value) operation, which is an alias for the above-described operation with the operation that of setting the value to the given argument.
  • operations that are provided by the MSV object 166 may be preceded by closing the published view objects 153 in a close_published_views() process, which is described below.
  • the value chain 156 may have one or multiple of the following properties.
  • the value chain 156 may represent the history of value objects 158, which are associated with a view object 153.
  • the value chain 156 may contain immutable references to the value chain's view object 153.
  • the value chain's view object 153 may contain immutable references to the value chain's view object 153.
  • references may be a pointer that contains flags that record whether the view object's isolation context 1 10 is mutable and/or a snapshot.
  • the value chain 156 may contain a pointer to the latest, or most recent, value object 158, and in accordance with example implementations, the pointer may contain a flag that indicates or represents whether the most recent value object 158 was due to a frozen read or equivalent operation (e.g., an operation that asserts a value equivalent to the current value, such as an operation of adding or subtracting zero or an operation of multiplying or dividing by one).
  • a frozen read or equivalent operation e.g., an operation that asserts a value equivalent to the current value, such as an operation of adding or subtracting zero or an operation of multiplying or dividing by one.
  • view-relative pointers may be installed on the value chain 156.
  • the view-relative pointers may be associated as values in the value objects 158 that are installed on the value chain 156.
  • the view-relative pointers may be constrained to be relative to some view object 153 that shares an isolation context 1 10 with the view object 153 of the value chain 156.
  • the value chain 156 may contain an atomically-updated reference to a conflict 152 that is associated with the value chain 156. This may be a null reference to indicate a lack of a conflict.
  • the value chain 156 may provide one or multiple of the following operations.
  • the value chain 156 may provide a value(timestamp) operation to return a reference to the value object 158 representing the latest value object 158 in the value chain 156 before the given timestamp, or provide a null reference, if no such value object 158 exists.
  • the operation may involve traversing the value chain 156 by starting at the latest value object 158 associated with the 158 and following each value object's link to the prior value objects 158.
  • the value chain 156 may provide an add_value(value) operation to update the most recent value object 158 reference to a new value object 158 with the provided value.
  • the timestamp of the new value object 158 may be the current value of the event counter 164, and the prior pointer of the new value object 156 may refer to the old most recent value object 158.
  • the add_value(value) operation may return the newly added value object 158.
  • the prior pointer for the new value object 158 may be set to the prior pointer of the old most recent value object 158, rather than the prior pointer of the most recent value object 158, for the case in which the timestamp of the old most recent value object 158 is the current value of the event counter 164.
  • This allows the old most recent value object 158 to be garbage collected (assuming there are no other references to it). That is, in accordance with example implementations, when the event counter 164 has not changed in between successive additions to a value chain 156, the new value object 158 replaces the old most recent value object 158, rather than extending the chain of value objects 158.
  • the most recent value object pointer contains a read-in-snapshot flag, indicating whether the value from the most recent value object 158 in the value chain 156 was read after following the parent link of a view object 153 associated with a snapshot isolation context 1 10 (e.g., when the timestamp of the read is a stable time for a snapshot rather than the "most recent" timestamp).
  • the reference to the most recent value object 158 (including its read-in-snapshot flag) is remembered. If the value object 158 found is the most recent value object 158, a single attempt is made to replace the remembered value object reference with an identical one that has the read-in- snapshot flag set to true.
  • this attempt fails, it means that either some other thread succeeded (i.e., a failure occurred due to the assumption that the false value for the read-in-snapshot flag was wrong) or another value object 158 was added to the value chain 156 (i.e., a failure occurred because the most recent value object 158 reference had changed). Either cause of failure removes the problem.
  • the old most recent value object reference may be read before the current timestamp is read.
  • the current timestamp is incremented upon the creation of a snapshot, it may be inferred that either the read-in-snapshot flag of the read most recent value object reference is false or the current timestamp was incremented since the last value object 158 (e.g., the current most recent value object 158) was added to the value chain 156.
  • the new value for the most recent value object reference has the read-in-snapshot flag set to false. If it is the case that the read-in- snapshot flag was set in between the time the read of the most recent value object reference was made and the time of the attempted update, then the CAS operation to change the most recent value object reference from the read value to the new value fails. In this case, another attempt is made to add the value, which involves reading a new current timestamp and updating the timestamp and next pointer on the value object 158 being added.
  • the MSV state 162 indicates states for value chains 156 of an associated MSV.
  • the MSV state 162 contains an array of value chains 156 (by reference).
  • Each value chain 156 may be considered “publishable” or “unpublishable” based on its associated isolation context.
  • Each value chain 156 may further be considered to be "open” or "closed”.
  • An open value chain 156 is one on which a value object 158 was added subsequent to the last stable time of the associated isolation context 1 10 as of the time the MSV state 162 was created.
  • a closed value chain 156 is one that is not an open value chain 156.
  • a closed value chain 156 is publishable, and an unpublishable value chain 156 is open.
  • the value chains 156 may be arranged in the array such that all open value chains 156 preceded all closed value chains 156 and all unpublishable value chains 156 precede all publishable value chains.
  • the first entry in the array may be reserved for a value chain 156 that is associated with the top-level view object 153.
  • This value chain 156 is unpublishable (since the top-level view object 153, which lacks a parent is unpublishable) and as such, may be considered open even if it has no value objects 158.
  • the MSV state 162 may contain an indication of the number of contained unpublishable value chains 156, the number of open publishable value chains 156, and the number of closed value chains 156. These numbers allow the determination of the state of a value chain 156 based on its position within the array.
  • the MSV state 162 may provide one or multiple of the following operations.
  • the MSV state 162 may provide the close_published_views() operation, which is further described below.
  • the MSV state 162 may provide a values(view, open only?) operation to return the value chain 156 that is associated with the specified view object 153, if it exists. If the "open only?" parameter is a Boolean true value, then a value chain 156 is not returned unless the value chain 156 is found among the open (e.g., unpublishable and open publishable) value chains 156.
  • the MSV state 162 may provide a current_value(view, timestamp) operation to return the value associated with the view object 153 as of the specified timestamp.
  • this operation may traverse parent view objects 153 and the timestamp may be modified when traversing parent view objects 153 of view objects 153 associated with snapshot isolation contexts 1 10.
  • the operation may begin at the specified view object 153 and may call the values(view, open only?) operation to get the value chain 156 associated with the view object 153.
  • the "open only?” parameter to the operation may be the Boolean true value just in case the specified timestamp is the "most recent" timestamp.
  • the current_value operation may call value(timestamp) on this result to get a value object 158. If none is found, the operation may replace the current view object 153 by its parent view object 153. If the current view object 153 is associated with a snapshot isolation context 1 10, the operation may replace the timestamp with the last stable time of the isolation context 1 10 prior to the timestamp by calling the publication_before(timestamp) operation on the isolation context object 151 .
  • the search may be performed again, and this process may continue until a value object 158 is found or the parentage is exhausted.
  • the current_value(view, timestamp) operation returns (as a secondary return value) an indication that the value 153 returned should not be trusted if the MSV's state has changed.
  • the caller attempts to confirm that the MSV state 162 is still associated with the MSV object 166. If this is not the case, the caller may call current_value() on the new state. This process may repeat until the secondary return indication is not seen or the MSV object's 166 current state has not changed.
  • an MSV object 166 may provide a
  • close_published_views() operation to update the MSV object 166 to be associated with an MSV state 162 that reflects all publications of isolation contexts 1 10 that have happened since the last time close_published_views() was called.
  • the close_published_views() operation may accept as a parameter a view object 153 (called the "write view” below) indicative of a view 120, if any, that the caller intends to assert a value in and which, therefore is associated with an open value chain 156 in the resulting MSV state 162.
  • the operation may return the following values: 1 . the resulting MSV state 162, which if not null, has been installed in the MSV object 166; 2. the value chain 156 associated with the write view (if specified) in the resulting MSV state 162; and 3. an indication of whether the returned value chain 156 (if any) was an open value chain 156 prior to the operation.
  • a thread executing in the system 100 may perform the following actions when executing the
  • the thread reads the current state of the MSV state 162 (called the
  • the thread If there was no remembered state (e.g., if the current state of the MSV object 166 was a null reference), then if there is no write view object 153, the thread returns a null MSV state 162 reference. Otherwise, the thread creates a new value chain 156 associated with the write view. If the write view is not the top- level view object 153, the thread may also create a value chain 156 associated with the top-level view object 153. The thread then creates a new MSV state associated with an array containing the created value chains 156 and recording the number of unpublishable and open publishable value chains 156 this represents. The thread returns the created MSV state 162, the value chain 156 associated with the write view, and an indication that the returned value chain 156 was not open.
  • the thread calls a find_need_close() operation on the MSV state 162, which returns a priority queue indicating open publishable value chains 156 whose associated with isolation context objects 151 that have been published more recently than the timestamp of the most recent value object 158 on the value chain 156.
  • the priority queue may be ordered from least-recently published to most-recently published, and the elements may include the publication time following the most recent value object's 158 timestamp and an index in the value chain 156 array of the MSV state 162. It is noted that the timestannp stored in the priority queue may not be reflect the most recent publication of the isolation context object 151 . In searching for value chains 156 to add to the priority queue, if the thread discovers a value chain 156 associated with the write view 153, it may remember it for later use.
  • the thread may return from the operation: the current MSV state 162, the write value chain 156 (if any), and, if a write value chain 156 was found, an indication that the write value chain 156 was previously open.
  • a write view object 153 was specified and no associated value chain 156 was discovered, it may be inferred that no write value chain exists among the open publishable value chains 156. If the write view object 153 is associated with an unpublishable isolation context 1 10, a search may be made through the unpublishable value chains 156 associated with the MSV state 162. As an optimization, if the write view object 153 is the top-level view object 153, an associated value chain 156 may be found as the first value chain 156 among the unpublishable value chains 156. If an associated value chain 156 is found, it may be returned, along with the current MSV state and an indication that the value chain 156 was previously open.
  • a new MSV state 162 may be made that is a copy of the remembered state with the addition of a new unpublishable value chain 156 associated with the write view 153. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the new value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
  • a search may be made through the closed value chains 156 associated with the MSV state 162. If an associated value chain 156 is found, a new MSV state 162 may be made that is a copy of the remembered state, save that the found value chain 156 is moved to become an open publishable value chain 156. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the found value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
  • a new MSV state 162 may be made that is a copy of the remembered state with the addition of a new open publishable value chain 156 associated with the write view 153. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the new value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
  • process_close() may be called on the current MSV state 162, passing in the priority queue and the write view object 153. The operation returns the same three values, which are remembered.
  • the process_close() operation has access to the priority queue and the write view object 153 (if any) of its caller, and in accordance with example implementations, maintains the following structures: a vector of open publishable value chains 156; a vector of closed value chains 156; and a vector of new unpublishable value chains 156. It is noted that the designation of these value chains 156 as, e.g., open publishable, represent the point of view of an MSV state 162 being designed and the designation for a given value chain 156 may change during the operation.
  • the vector of open publishable value chains 156 may initially contain the open publishable value chains 156 in the MSV state 162. Whenever a new open publishable value chain 156 is created or a closed value chain 156 is reopened, the value chain 156 is added to the end of this vector. Whenever a value chain 156 is closed, the slot in the vector is replaced by a null pointer. A side count may be maintained, representing the number of non-null entries in the vector.
  • the vector of closed value chains 156 may initially contain the closed value chains 156 in the MSV state 162. In accordance with an implementation, the creation of this vector may be deferred until there is a need to alter its contents.
  • the vector of new unpublishable value chains 156 may initially be empty. When a new unpublishable value chain 156 is created, the value chain 156 is added to the end of the vector.
  • the unpublishable value chains 156 of the MSV state 162 being designed comprise the unpublishable value chains 156 of the current MSV state 162 and the contents of this vector.
  • the process_close() operation may repeatedly remove items from the priority queue and identify value chains 156 and publication timestamps, where the values removed reflect the earliest such timestamps in the priority queue, stopping when the priority queue is empty.
  • Each such value chain 156 (which may be called a child value chain) is in the open publishable value chain 156 vector.
  • the operation next identifies the parent value chain 156 associated with the parent view object 153 of the view object 153 associated with the child value chain 156. If the parent value chain 156 is identified in the open publishable value chain 156 vector, the unpublishable value chain 156 vector, or the unpublishable value chains 156 of the current MSV state 162, it is noted.
  • a new parent value chain 156 is created and add it to the unpublishable value chain 156 vector. If the parent view object 153 is associated with a publishable isolation context 1 10, a search is made through the closed value chain 156 vector (or, if this has not yet been created, the closed value chains 156 associated with the current MSV state 162). If a parent value chain 156 is found, the parent value chain 156 is reopened by removing it from the closed value chain 156 vector and adding it to the open publishable value chain 156 vector. If a parent value chain 156 is not found, a new parent value chain 156 is created and added to the open publishable value chain 156 vector.
  • the child value chain 156 may then be closed, and modifications (if any) of the child value chain 156 are transferred to its parent.
  • the child value chain 156 may be removed from the open publishable value chain 156 vector. If there is an indication in the child value chain 156 that the most recent value object 158 was added as the result of a frozen read operation, there is no value that needs to be transferred to the parent value chain.
  • the most recent value object 158 associated with the child value chain 158 may be removed from the child value chain 156 along with the indication. If this results in a child value chain 156 with no associated most recent value object 158, the child value chain 156 may be discarded.
  • the most recent parent value object 158 (if any) is discovered on the parent value chain 158. If the most recent parent value object 158 exists and the associated timestamp is greater than or equal to one less than the timestamp retrieved from the priority queue, this may indicate that another thread has already processed the publication of the child view object 153, so this thread need not.
  • a new parent value object 158 is created based on the most recent child value object 158, having a timestamp equal to the retrieved timestamp minus one and the same value unless the value is a view-relative pointer 154, in which case the value is a copy where the view object 153 associated with the copy is the parent of the view object 153 associated with the original.
  • the prior value object 158 of the new parent value object 158 is the old most recent parent value object 158, if any.
  • the operation then makes a single attempt to atomically replace the old most recent parent value object 158 with the new parent value object 158 in the parent value chain 156. A failure in this attempt may indicated that another thread has already processed the publication of the child view object 153.
  • the parent view object 153 is associated with a publishable isolation context 1 10, a determination is made as to whether the parent isolation context 1 10 was published following the retrieved timestamp. If it was, a new entry is added to the priority queue reflecting the parent value chain 156 and the earliest publication time of the parent isolation context 1 10 following the retrieved timestamp. It is noted that this step is performed even if an earlier determination was made that another thread installed a new parent value object 158 reflecting the current child value object 158.
  • the thread may proceed to identify the write value chain 156 associated with the write view object 153, if any. This may be performed by the same procedure as was used to identify each parent value chain 156.
  • the thread may create a new MSV state 162 associated with the unpublishable value chains 156 from the current MSV state 162, the unpublishable value chains 156 from the unpublishable value chain 156 vector, the open publishable value chains 156 from the open publishable value chain 156 vector, and the closed value chains 156 from the close value chain 156 vector (or if this last has not been created, from the current MSV state 162).
  • the thread may then return this new MSV state 162, the write value chain 156 (if any) and an indication of whether the write value chain 156 was found among the open value chains 156.
  • Cooperative tasks are described herein for purpose serializing operations, such as publication and adding conflicts to an isolation context or modifying a value. Each point of serialization may be associated with a "task holder.”
  • the task holder refers to a task and may be atomically updated.
  • the task is first installed in the holder. This installation may involve attempting to replace a null pointer in the task holder with a reference to the task by means of the CAS operation. If the task holder was not empty, then this attempt fails, and the current value is obtained. If the blocking task is not the one being installed (i.e., if it was not the case that another thread was successful in installing it), then the task is run in the current thread, and an attempt is made to replace the task with null. If this fails, it means that another thread removed it first, so another iteration occurs to try the install again. When the task is successfully installed, the task is then run and then an attempt is made to remove it (replace it with null).
  • the work of the task may occur in the task's run() operation, which may be defined in a subclass.
  • Publish tasks may be installed in the isolation context objects 151 to (attempt to) perform a publication of the corresponding isolation context 1 10.
  • the publish tasks are in competition for this task holder with other publish tasks and with write tasks that may wish to add conflicts to this context.
  • the publish task may have one or multiple of the following properties.
  • the publish task may have an immutable reference to the isolation context object 151 being published.
  • the publish task may have an atomically-updated timestamp representing the publish time (initially zero).
  • the publish task may have an atomically-updated reference to a list of conflicts seen by the publish task. This atomically-updated reference may contain a "has value" flag, initially false, to be able to distinguish between the scenario in which it is unknown whether there are any conflicts 152 and the scenario in which it has been deternnined that there are no conflicts 152.
  • the publish task has finished, and the conflicts reference either contains a list of blocking conflicts or is null, indicating that the publish succeeded as of the publish time.
  • the publish task is complete.
  • isolation context object's 151 associated isolation context state 157 is requested to attempt to publish as of the determined publish time.
  • This may request may return a list of conflicts 152, which may be an empty list, indicating successful publication. This list may be installed in the task via the conflicts reference with the has-value flag set. The publish task is complete.
  • an isolation context state 157 associated with an isolation context object 151 may attempt to publish as of a specified publication timestamp as follows:
  • contingent conflicts 152 associated with the current isolation context state 157
  • the corresponding contingent conflict list is traversed, adding each contingent conflict 152 to its associated isolation context object 151 .
  • the nodes in the list may contain a "handled?" flag, which is checked before adding and set afterwards, to minimize duplicate work as threads run through the list at the same time.
  • a count of the contingent conflicts 152 handled is maintained, and a handling flag is used in addition to the "handled?" flag.
  • contingent conflicts 152 having states that may be changed from unhandled to handling are added, the state then being changed unconditionally to handled, and a counter may then incremented.
  • a new isolation context state 157 is constructed with the given publication timestamp and whose prior state is the current isolation context state 157.
  • the failed attempt identifies the isolation context state 157 associated with the isolation context object 151 , and the operation may return the result of asking this isolation context state 157 to attempt to publish as of the specified timestamp.
  • a publish result object is created based on the publish time and conflict list stored in the task.
  • the creation of the publish result object may be performed outside of the publish task.
  • the publish result object contains an immutable reference to the isolation context object 151 that was published; an immutable timestamp representing the publish attempt time; and an immutable list of blocking conflicts 152, which is empty if the publish attempt succeeded.
  • a write task is installed to make a modification to an MSV object 166. It is noted that the modification may introduce a conflict in a publish task that might otherwise be occurring at the same time in a different thread.
  • a write task may also be installed in an isolation context object 151 (in competition with one or multiple publish tasks) when it is determined that there is a possibility that the modification may induce a conflict.
  • a write task forces a serialization of all modifications of a given field, regardless of isolation contexts.
  • some modifications may be made directly on the MSV state 162 associated with the MSV object 166 without installing corresponding write tasks.
  • modifications may include those which cannot be affected by other modifications to the same MSV object 166 (e.g., operations to set a value in any view 120 and any modification operation in a view 120 associated with a snapshot isolation context 1 10) and cannot introduce conflicts 152 into isolation context objects 151 (e.g., operations in views 120 associated with non-snapshot isolation contexts 1 10 that have no child isolation contexts 1 10 and are either non-publishable or have no publishable sibling isolation contexts 1 10).
  • the performance of a write task may entail the performance of one or more phases: a cache state phase, an install context events phase, an add value to value chain phase, and an add conflicts phase.
  • a write task may contain an atomically-updated indication of the current phase being performed by some thread or an indication that all phases have been completed.
  • a write task may contain an immutable reference to the MSV object 166 and may contain an immutable reference to the write value chain 156 to be modified.
  • a write task may contain a Boolean value (initially false), which represents whether the write value chain 156 was modified.
  • a write task may contain an immutable modification operation (e.g., an operation to set or add).
  • a write task may contain an immutable argument to the operation.
  • a write task may contain an immutable indication of whether the modification resolves conflicts.
  • a write task may contain an atomically-updatable reference (initially null) to an MSV state 162.
  • a write task may contain an atomically- updatable timestamp (initially null) that represents the start time of the operation.
  • a write task may contain the value returned to be returned by the modification operation.
  • the thread When a thread performs the write task, the thread reads the indication of the current phase, performs the associated action and attempts to update the current phase indication by replacing the indication of the phase performed with an indication of the next phase (or that the write task is complete if there is no next phase). If this fails, it may be inferred that another thread already recorded the completion of the phase and may have performed later phases. The thread continues processing based on the resulting phase indication (e.g., the phase indication set by the thread or by another thread) until the current phase indication indicates that the write task is complete.
  • the resulting phase indication e.g., the phase indication set by the thread or by another thread
  • the thread performs the following in order. First, the thread attempts to change the start time from zero to a value read from the event counter 164. If this attempt fails, then the start time has been set by another thread. Next, in the cache state phase, the thread makes an attempt to change the cached state from null to the MSV state 162 currently associated with the MSV object 166. If the attempt fails, it may be inferred that another thread made the change. This sequence ensures that publication of an isolation context object 151 after the recorded start time cannot invalidate the recorded MSV state 162.
  • the thread adds the write task to every isolation context object 151 for which the present modification could possibly create a conflict.
  • the installed write task in a given isolation context 151 therefore causes a check for conflicts to be performed due to the installed write task before the given isolation context 151 may publish. . If there are is a publish task installed on such an isolation context object (indicating an ongoing publication), the thread assists in completing the publish task (i.e., the thread assists the ongoing
  • the thread adds the write task to every isolation context object 151 associated with an open publishable value chain 156 in the MSV state 166, including the write value chain 156, with the exception of those value chains 156 that are already associated with a conflict 152.
  • An atomically-updatable integer indicating the next value chain 156 associated with the MSV state 162 to process may be used to ratchet the next slot to consider, allowing threads to avoid trying to add the write task to an isolation context object 151 that has already received it from another thread.
  • the thread After installing the task on each isolation context object 151 , the thread checks to see whether the isolation context object 151 was published after the start time. If the thread determines that the isolation context object 151 was published after the start time, then the publish occurred after the creator of the task called the close_published_views() process. Therefore, in accordance with example
  • the thread calls the closed_published_views() process again on the cached MSV object 166 and updates the cached MSV state 162 to the result.
  • an indication that the thread is already processing a write task may be passed, to prevent the close_published_views() process from clearing the pending write task on the MSV object 166.
  • the task may also update the cached start time to the current timestamp (unless the cached start time was already greater due to the time being set by another thread after the current time was read).
  • the value is computed based on the operation and argument and stored in the task. This computation may not be performed in an atomic manner, as an assumption may be made that repeated evaluations will yield the same value.
  • modification may be deemed to be unnecessary. If this is the case, and if the operation is other than to retrieve the current value and either the write value chain 156 is empty (e.g., has no most recent value object 158) or the associated isolation context 1 10 was published after the last value added was added to the write value chain 156, the otherwise unnecessary modification may be deemed to be necessary.
  • the call to the add_value() operation may be specified to indicate in the write value chain 156 that the value object 158 added was the result of a frozen read.
  • the thread removes the write task (if present) from the isolation context objects 151 associated with all open publishable value chain 156 in the MSV state 162, and the phase is complete.
  • the write value chain 156 does not already have an associated conflict 152, the write task is not noted to be resolving a conflict, the current view object
  • the write task may be removed from the current isolation context object 151 unless that is the write isolation context object 151 .
  • the system 100 may be a system 900 of one or multiple physical machines 910, as depicted in Fig. 9.
  • the physical machine 910 is a processor-based machine that is constructed from actual machine executable instructions, or "software" 960 and actual hardware 920.
  • the hardware 920 of the physical machine 910 may include, for example, one or multiple central processing cores 922 (e.g., central processing cores (CPU) and/or graphics processing unit (GPU) cores), a memory 924, one or multiple network interfaces 926, one or multiple mass storage devices 927, a display, input/output (I/O) devices and so forth.
  • CPU central processing cores
  • GPU graphics processing unit
  • the memory 924 may be a non-transitory memory, which includes non-transitory memory storage devices, such as semiconductor memory devices, phase change memory devices, random access memory (RAM) devices, dynamic RAM (DRAM) devices, resistive memory devices, flash memory devices, a combination of one or more of these devices, and so forth.
  • non-transitory memory storage devices such as semiconductor memory devices, phase change memory devices, random access memory (RAM) devices, dynamic RAM (DRAM) devices, resistive memory devices, flash memory devices, a combination of one or more of these devices, and so forth.
  • the machine executable instructions 960 may be stored in a non-transitory computer-readable storage medium, such as the memory 924, for example.
  • the instructions 960 when executed by one or multiple of the processing cores 922, may cause the processing core(s) 922 to execute one or multiple applications 966, i.e., execute the program code 1 12 as part of one or multiple operating system processes 964.
  • One or multiple processes 964 may share a given isolation context, as discussed herein.
  • the machine executable instructions 960 may include an operating system 965, a virtual machine monitor (VMM) 969, or hypervisor, as well as program instructions 968 that, when executed by the processing core(s) 922 cause the core(s) 922 to provide the conflict resolver engine 140 (see Figs. 1A and 8).
  • VMM virtual machine monitor
  • hypervisor program instructions 968 that, when executed by the processing core(s) 922 cause the core(s) 922 to provide the conflict resolver engine 140 (see Figs. 1A and 8).
  • the physical machine 920 may store, in accordance with example implementations, data 970, such as data 972 for the data structures 106 (see Fig. 1A), data 974 for the management resources 130 (see Fig. 1A), and so forth.
  • data 970 such as data 972 for the data structures 106 (see Fig. 1A), data 974 for the management resources 130 (see Fig. 1A), and so forth.
  • the system 900 may include a high speed interconnect 980 (a server rack backplane, a server cabinet backplane, a bus, a serial link, and so forth) that interconnects multiple physical machines 910.
  • a high speed interconnect 980 a server rack backplane, a server cabinet backplane, a bus, a serial link, and so forth
  • the system 900 may include three or more physical machines.
  • the system 900 may include a single physical machine 910.
  • the machines 910 may be disposed at a single physical location (or facility) or may be geographically distributed at multiple locations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A technique includes executing a plurality of tasks to modify data according to a first view associated with a first isolation context; and identifying a conflict preventing the first isolation context from publishing. The publication merges the first view with a second view associated with a second isolation context. The technique includes applying task-based conflict resolution to resolve the conflict, including selectively executing a subset of at least one task of the plurality of tasks. The number of task(s) of the subset is less than the number of tasks of the plurality of tasks.

Description

TASK-BASED CONFLICT RESOLUTION
Background
[0001 ] A computer system may have a memory that is shared by multiple computing entities (multiple threads, for example). The computing entities may concurrently perform computations that change the values that are stored in the shared memory. One way to control the concurrent processing by the computing entities is to organize the changes by the entities into transactions and atomically commit the transactions to memory in a manner that maintains the memory in a consistent state.
Brief Description of the Drawings
[0002] Fig. 1 A is a schematic diagram of a system according to an example implementation.
[0003] Fig. 1 B is an illustration of a conflict resolver engine of Fig. 1 A according to an example implementation.
[0004] Fig. 1 C is an illustration of data structures to manage multiple simultaneous value (MSV) objects according to an example implementation.
[0005] Fig. 2 is an illustration of a hierarchical ordering of isolation contexts according to an example implementation.
[0006] Fig. 3A illustrates the relationship of a parent isolation context and a live child isolation context created from the parent isolation context according to an example implementation.
[0007] Fig. 3B illustrates the relationship of a parent isolation context and a snapshot child isolation context created from the parent isolation context according to an example implementation.
[0008] Figs. 4A and 4B are flow diagrams illustrating techniques to manage publication of an isolation context according to example implementations.
[0009] Fig. 5 is a flow diagram illustrating a technique to perform task-based conflict resolution according to an example implementation.
[0010] Fig. 6 is a flow diagram illustrating a technique to identify tasks to re-run in connection with task-based conflict resolution according to an example
implementation.
[001 1 ] Fig. 7 is a flow diagram illustrating a technique to reset values in connection with task-based conflict resolution according to an example implementation.
[0012] Fig. 8 is a schematic diagram of a system to perform task-based conflict resolution according to an example implementation.
[0013] Fig. 9 illustrates a schematic diagram of a system of physical machines according to an example implementation. Detailed Description
[0014] Multiple threads executing in one or more processes of a computer or multiple computers may perform operations that are directed to one or multiple shared data structures. One approach to maintain a consistent state of the data structure is to block other threads from making changes to the data structure while one of the threads makes changes. This may, however, result in inefficient processing.
Another approach to maintain a consistent state of the data structure is for the threads to process their changes as transactions, which are atomically committed to memory or rolled back (e.g., their modifications discarded) upon discovery that changes made by another thread conflict with the changes attempting to be committed. Such an approach may present challenges for relatively large
transactions (e.g., transactions that read or modify a large number of locations associated with the data structure or transactions that run for a long time before attempting to commit), however. For such transactions, the probability that no other thread, during the course of the transaction's execution, made a modification that results in a conflict that prevents the commit attempt from being successful may be relatively small. Moreover, due to the complexity of the transaction, it is likely that a second attempt (and subsequent attempts) at redoing the changes following a rollback and retrying the committing the transaction may once again fail.
[0015] In accordance with example implementations that are discussed herein, computations on a shared data structure may be isolated within corresponding isolation contexts. In general, an "isolation context" (also called a "computational context" herein) refers to an environment in which computations that are performed within the environment are contained within the environment so that the results of the computations are not, in general, visible to other isolation contexts. Due to the computational isolation, machine executable instructions, or program code, that is executing within the isolation contexts may concurrently make modifications to a data structure that is shared by the contexts. Here, a "data structure" refers to an organization of one or multiple units of data, which are stored in one or multiple storage locations. In accordance with some example implementations, an isolation context may be used to access (e.g., read or modify) multiple data structures. Each isolation context may present to program code a corresponding "view" of the data structure, where the "view" refers to the value(s) that the isolation context reads for corresponding properties of the data structure. As described herein, in accordance with example implementations, a single isolation context may be associated with multiple views. As examples, a "property" may be a location associated with the data structure (e.g., a field within a record or an index of an array), a structural property of the data structure (e.g., a number of elements in a list), or a relationship that is associated with the data structure (e.g., an association of a value with a key in a map).
[0016] Although the isolation contexts create computational isolation, there are mechanisms by which an isolation context may transfer information to another isolation context. One way a first isolation context may transfer information to a second isolation context is for the first isolation context to publish. "Publishing" an isolation context refers to combining or merging the view of the publishing isolation context with the view of another isolation context so that the views are the same at the time of publication. A given publication attempt, however, may not succeed due to one or multiple publication conflicts. In general, a publication conflict (also called a "conflict" herein) refers to a reason for the publication not to occur. An example of a conflict is the existence a modification made to a data structure (e.g., a change to a field of a record) within the isolation context that is the target of the publication attempt when a similar modification (e.g., a change to the same field of the same record) was made within the isolation context being published or when a similar value (e.g., the value of the same field of the same record) was read within the isolation context being published and a modification to the same data structure or another data structure was made that may have been based on the value that was read.
[0017] Systems and techniques are described herein for purposes managing the publication of an isolation context to resolve any conflicts that prevent the publication from occurring. In this manner, conflicts may occur in connection with the initial publication attempt or in one or multiple subsequent intermediate attempts before a successful, final publication attempt. "Resolving" a conflict refers to obviating a reason preventing publication from occurring. More specifically, as described herein, any conflict(s) that cause a publication attempt to fail are identified; one or multiple actions are taken to resolve the conflict(s), including action(s) that involve making one or multiple modifications within the first isolation context; and then, publication subsequently occurs by allowing the second isolation context to view the
modifications made by the first isolation context prior to the first publication attempt as well as the modification(s) made to resolve the conflict(s), as of the time that is associated with subsequent publication attempt.
[0018] In accordance with example implementations, task-based conflict resolution is used to resolve conflicts that prevent publication of the first isolation context. In this manner, making modifications within the first isolation context may involve executing a set of tasks, where a "task" refers to a unit of one or more machine executable instructions, such as a function; a subroutine; a method; a block, expression, or sequence of statements written in a programming language; and so forth. These modifications, in turn, may give rise to conflicts that prevent publication of the first isolation context. As described further herein, with task-based conflict resolution, a subset of the tasks may be identified and re-run to resolve the conflicts. Because the subset of tasks that may be re-run are fewer in number than the original set of tasks, the conflicts may be timely resolved, thereby reducing the number of conflicts that may arise (if any) on the next publication attempt.
[0019] Referring to Fig. 1A, as a more specific example, in accordance with some implementations, a processor-based system 100 includes context management resources 130, which the system 100 uses to create and manage isolation contexts 1 10 (N isolation contexts 1 10-1 , 1 10-2...1 10-N being depicted in Fig. 1A), and manage publications by the isolation contexts 1 10. In general, an isolation context is a mechanism by which executing machine executable instructions (called "program code 1 12," herein) may isolate itself from other executing program code 1 12. As a more specific example, the program code 1 12 may be an entire application or program or may be part of such an application or program, such as a thread. At any given point, program code 1 12 may be working in a prevailing (also referred to herein as the "current" or "working") isolation context 1 10. For example, program code 1 12-1 may be working in a prevailing isolation context 1 10-1 . In a system that includes multiple threads, each thread may have its own prevailing isolation context 1 10, and different threads executing program code 1 12 at the same time may be working in different prevailing isolation contexts 1 10. In accordance with some implementations, when a parent thread creates a child thread, the prevailing isolation context 1 10 in the child thread when it begins executing is that of the parent thread (e.g., at the time the child thread was created). References herein to program code 1 12 "executing in" a given isolation context 1 10 refer to the program code 1 12 executing while the given isolation context 1 10 is the prevailing isolation context.
[0020] As a more specific example, a sequence of machine executable instructions (called "thread A" for this example) of program code 1 12 may be executing in one of the isolation contexts 1 10, and another sequence of machine executable instructions (called "thread B" for this example) of program code 1 12 may be executing in another one of the isolation contexts 1 10. Also, for this example, threads A and B may share a data structure 106. With a few exceptions that are described below, changes made to data by thread A to the data structure 106 may be invisible to thread B, and changes made to the data structure 106 by thread B may be invisible to thread A. That is, different computations, working at the same time and looking at the same fields of the same records, may correctly see different values. In general, as further described herein, each isolation context 1 10 has an associated view 120 (or multiple views 120) to a given data structure 106; and as such, multiple isolation contexts 1 10 may have different associated views 120 of the same data structure 106.
[0021 ] Multiple operating system threads may work in the same isolation context 1 10, in accordance with example implementations. In some examples, the multiple operating system threads may be associated with multiple operating system processes and the multiple operating system processes may be associated with multiple computers. This allows sharing of the same view 120 among multiple processes and computers. Moreover, this arrangement allows processes that share an isolation context 1 10 to be written in different programming languages.
[0022] As illustrated in Fig. 1A, the data structures 106 may be stored in a data store 104. In accordance with some implementations, the data store 104 may include a namespace that associates names with the data structures 106. An example of a namespace is a file system, in which data structures 106 are stored as files that are identified by corresponding file names. Another example of a namespace is a key- value store, which stores associated key-value pairs. In this manner, a key may be used to find a respective value that is stored in the key-value store.
[0023] The data store 104 may be stored in a physical storage device (a volatile or non-volatile memory device, a hard disk device, and so forth) or may be stored across a distributed arrangement of physical storage devices. In general, a "data structure 106" refers to any unit of data that may be stored. Examples of data structures 106 include files, records, lists, sets, maps, tables, arrays, strings, queues, stacks, graphs, directories, primitives (a number, a Boolean value, a character, as examples), and so forth. A data structure 106 may also include a data structure included by reference (e.g., a pointer).
[0024] As depicted in Fig. 1A, in accordance with example implementations, the context management resources 130 may include one or multiple libraries 134, with each library 134 including one or multiple functions 136 that may be called by the program code 1 12 for such purposes as creating isolation contexts, establishing views for isolation contexts, binding functions to isolation contexts, identifying conflicts, resolving conflicts, performing task-based conflict resolution, publishing contexts, as so forth, as further described herein. Moreover, as further described herein, the context management resources 130 may contain one or multiple conflict resolver engines 140 to perform task-based conflict resolution.
[0025] Referring to Fig. 1 B, in accordance with example implementations, a given conflict resolver engine 140 may include one or multiple of the following components, which are described further herein: a task creator 142, a task monitor 144, a task selector 146, a task scheduler 148 and a graph generator 149.
[0026] Referring to Fig. 2 in conjunction with Fig. 1A, in accordance with example implementations, the isolation contexts 1 10 are fully hierarchical, as illustrated by an example hierarchical tree 200. In this manner, multiple isolation contexts 1 10 may form a tree that is rooted at a top-level, global isolation context 1 10, and a given isolation context 1 10 may have any number of children. For the example of Fig. 2, isolation context 1 10-3 is a global context that is a parent of isolation contexts 1 10-4 and 1 10-5 and grandparent of isolations contexts 1 10-6 and 1 10-7; isolation contexts 1 10-4 and 1 10-5 are siblings; isolation contexts 1 10-6 and 1 10-7 are children of parent isolation context 1 10-4; and isolation contexts 1 10-6 and 1 10-7 are siblings. The isolation context hierarchy may extend to any depth.
[0027] One way for information to travel from one isolation context 1 10 to another isolation context 1 10 is for a child isolation context 1 10 to successfully publish, thereby making the changes visible in the parent isolation context 1 10. In general, if isolation context P is a parent of isolation context C, then changes made within isolation context P are generally visible in the isolation context C, but changes made in the isolation context C are not visible in the isolation context P until the isolation context C is successfully published.
[0028] In accordance with example implementations that are discussed herein, the child isolation context 1 10 may have two forms, which are specified when the child isolation context 1 10 is created: a transparent, or live, isolation context; and an opaque, or snapshot isolation context.
[0029] Referring to Fig. 3A in conjunction with Fig. 1A, for a live child isolation context (such as example live child isolation context 1 10-9 of Fig. 3A) changes made to a location L (such as location 304 in Fig. 3A) in the isolation context's parent (such as isolation context 1 10-8 of Fig. 3A) are visible in the live child isolation context until a change is made to the location L within the child isolation context. It is noted that the changes made in the parent isolation context may actually be changes that are made to an ancestor of the parent isolation context but are visible in the parent isolation context; and the changes may be due to a different child isolation context of the parent isolation context publishing its changes to the parent isolation context.
[0030] Referring to Fig. 3B in conjunction with Fig. 1A, for a snapshot child isolation context (such as example snapshot child isolation context 1 10-1 1 of Fig. 3B), no changes that are made in the parent isolation context (such as parent isolation context 1 10-10 of Fig. 3B) after the snapshot child isolation context is created are visible within the child isolation context. [0031 ] Within a live child isolation context C, reads made on locations that have not been modified in the context C return, as stated above, the value that the location would have had, had the read been made in the parent isolation context P. Either at the time of the read or by setting a default (at or after the time isolation context C was created or as a general policy in the process), the read may be specified as being "frozen." A frozen read has the property that, once it occurs, all subsequent reads of that location within isolation context C (frozen or not) return the same value until the location is modified within isolation context C or until isolation context C is successfully published.
[0032] When an attempt is made to read or otherwise determine a value associated with a property within a view 120 associated with an isolation context 1 10, the system may determine a value as of a current time. More generally, to determine a value as of a given time (e.g., a request time), the system may attempt to determine whether a value associated with the property had been established (i.e., previously determined) in the view 120 prior to the request time and subsequent to the later of the time that the isolation context 1 10 was created or the time the isolation context 1 10 was last successfully published prior to the request time. Such a value may have been established by modifications to the property by program code 1 12 executing in the isolation context 1 10, by frozen reads to the property by program code 1 12 executing the isolation context 1 10, or by successful publication of a value for the property from a child isolation context 1 10 of the isolation context 1 10. If such a value was established, then it is the determined value. Otherwise, the determined value may be found by determining an inherited value for the property, where the inherited value is the value associated with the property within a parent view 120 of the view 120 associated with the parent isolation context 1 10 of the isolation context 1 10 as of an effective time based on the view 120 and the request time. As an example, when the isolation context 1 10 associated with the view 120 is a live (i.e. non-snapshot) isolation context 1 10, the effective time may be the request time. As another example, when the isolation context 1 10 associated with the view 120 is a snapshot isolation context 1 10, the effective time may be the later of the creation time of the snapshot isolation context 1 10 and a time associated with the latest successful publication of the snapshot isolation context 1 10 prior to the request time. [0033] Publishing and the creation of live and snapshot contexts are some ways that information may travel from one isolation context 1 10 to another. Another way that information may travel from one isolation context 1 10 to another is by returning the result of a computation that is performed in another isolation context 1 10. For example, in accordance with some implementations, an isolation context 1 10 may invoke a call operation, which takes a function 136 (see Fig. 1A) as an argument (and perhaps takes other arguments to pass to the function), temporarily sets the prevailing isolation context to another context, runs the function 136, and returns the result. The result of the call operation remains the value as it appears in the invoked isolation context 1 10. For example, an isolation context C may call the following function: x = D.call(f, r) where "D" represents another isolation context 1 10, and "r" represents a parameter to function f, which is a record with a "count" field and where function f is to modify the count field of the provided record and return the record as its value. Continuing the example, within isolation context C, the count may have had a value of "10," but calling the function f sets the count to 20. After the call, the two variables r and x refer to the same record, but the view presented through the variable r is that of the isolation context C, while the view presented through the variable x is that of the isolation context D. Because of these differing views, r.getCountQ continues to return "10," but x.getCount() returns "20." And if something elsewhere caused a computation in the isolation context D to set the count on the same record r to 30, further evaluation in the isolation context C causes x.getCount() to return "30." In accordance with example implementations, the view presented through the variable x is neither that of isolation context C nor isolation context D but a composite "C's view of D's view" associated with isolation context C. If, within isolation context C, the count of the record accessed via x is either modified or accessed via a frozen read, subsequent reads of the count in the isolation context C via variable x result in the value established within isolation context C and ignore any intervening modifications made within isolation context D. This distinction remains until isolation context C is successfully published, at which point D's view and C's view of D's view are again identical. In this way, an isolation context may have multiple associated views 120.
[0034] In accordance with example implementations, program code in an isolation context 1 10 may also bind a function to an isolation context (e.g., the prevailing isolation context 1 10 or a different isolation context 1 10), returning a new function 136 which, when invoked, runs the original function 136 in the bound isolation context 1 10 (e.g., by invoking the call operation on the bound context and passing in the original function 136 as a parameter). This binding operation may be used to create a function 136 that is bound to the current isolation context 1 10 and that can be invoked in a child isolation context 1 10 (or another isolation context 1 10 via the call operation). The binding operation may be used to create several new functions by the same function 136 to a number of different contexts 1 10. These multiple isolation contexts 1 10 may be snapshot contexts 1 10 representing the state of the world at various times in the past (e.g., snapshots of a company taken at daily intervals). Alternatively, these multiple isolation contexts 1 10 may be child isolation contexts 1 10 used to explore and select among different alternative approaches to solving a problem. A function bound to the current isolation context 1 10 may be provided to a different context (e.g., by the call operation) to allow the different context to observe and store data in the bound isolation context 1 10.
[0035] Child isolation contexts 1 10 can obtain references to their parent isolation contexts 1 10, so even if there has been a modification or frozen read in a given child isolation context 1 10, the program code 1 12 of a child isolation context 1 10 may, in accordance with example implementations, invoke the call operation on its parent isolation context 1 10 to determine what the value of a particular location is in the view 120 of the parent isolation context 1 10. For example, to obtain the count field of a record "r" as it would be seen in the parent isolation context 1 10 of the current isolation context 1 10, a program written in Java might call: lsolationContext.current().parent().call(() -> r.getCountQ)
[0036] In a system in which values as seen in a global isolation context 1 10 (i.e. a root of an isolation context hierarchy 200) are considered to be "committed" and those in any nested isolation context 1 10 are considered to be "uncommitted", a function may be called on the global isolation context 100, for example: lsolationContext.global().call(() -> revenue) to obtain the committed value of a variable. Similarly, such a call may be used to set a default value that is seen by code working in unpublished isolation contexts 1 10. Similar manipulations via the global isolation context 1 10 may be used to manipulate the namespace in a committed manner from within an unpublished isolation context.
[0037] Because the isolation contexts 1 10 may pass out values without publishing modifications, such values may be accumulated. For example, program code 1 12 executing in an isolation context 1 10 may take a snapshot of the state of a company database periodically (e.g., every day, hour, minute or second) and make each of these snapshots available by binding the snapshots to names within a namespace. Alternatively, the executing program code 1 12 may append the snapshots to a single list, facilitating, for example, identifying the states of the database for the ten days that scored highest on some metric (e.g., the days during which the revenue of the company was highest).
[0038] As another example, program code 1 12 executing in an isolation context 1 10 may run, in parallel child isolation contexts 1 10, a series of potential modifications according to models with different parameters, and place the results (indicating predicted consequences) computed in each child isolation context 1 10 (along with parameters used with the respective models) in a single structure. Then, all of the results may be compared with one another and the child isolation context 1 10 that resulted in the best (and only the best) outcome allowed to publish its results.
[0039] As another example, program code 1 12 executing in an isolation context 1 10 may run, in parallel child isolation contexts 1 10, a series of non-deterministic simulations and collect the results produced within each simulation into a structure, and analyze the structure to determine, based on the results, an action to take. In such an example, the child isolation contexts 1 10 may be allowed to terminate without ever publishing their modifications to their parent isolation contexts 1 10. [0040] As an alternative to invoking a call operation on an arbitrary isolation context 1 10 or calling a function bound to an isolation context 1 10, program code 1 12 may execute code 1 12 in an arbitrary isolation context 1 10 by explicitly changing the prevailing isolation context 1 10 to be the arbitrary isolation context 1 10. Such a change may be temporary and bounded to particular region of code 1 12 (e.g., a particular function or program block) by ensuring that at the end of the region of code 1 12 the prevailing isolation context is reverted to what it had been before the change. One way to facilitate such a temporary setting of the prevailing isolation context 1 10 is for program code 1 12 written in the Java programming language to use a "try-with-resources" block, creating a resource object that, upon construction remembers the prevailing isolation context 1 10 and sets the prevailing isolation context 1 10 to the arbitrary isolation context, and that, upon invocation of the close() method on the object, sets the prevailing isolation context 1 10 to the remembered isolation context(). For program code 1 12 written in the C++ programming language, the "resource acquisition is initialization" (RAN) paradigm can be used similarly. In accordance with example implementations, isolation contexts control some, but not all variables or other memory locations available to the program. For such
implementations, program code 1 12 executing in the temporary prevailing isolation context 1 10 may obtain a value, such as a reference to a data structure, where the reference is associated with a view 120 associated with the temporary prevailing isolation context 1 10, and may store that value in a location that is not under control of the isolation contexts. After the temporary prevailing isolation context 1 10 is reverted to the former prevailing isolation context 1 10, program code executing in the former prevailing isolation context 1 10 may obtain the value associated with the view 120 associated with the temporary prevailing isolation context 1 10.
[0041 ] As mentioned above, another way that information may move between isolation contexts 1 10 is when a child isolation context 1 10 publishes its
modifications. It is noted that in accordance with some implementations, the program may specify, when an isolation context 1 10 is created, that the isolation context 1 10 is "detached," which means that the isolation context 1 10 may not be published. An error may be signaled (e.g., an exception may be thrown) if an attempt is made to publish a detached isolation context 1 10. [0042] In response to a given isolation context 1 10 requesting publication, the system 100 determines whether there are any conflicts that would inhibit (prevent, for example) the publication from proceeding. A conflict may be associated with a property associated with a data structure, where a property may be a location (e.g., a particular field of a particular record or a particular index of a particular array), a relationship managed by the data structure (e.g., the value associated with a particular key in a particular map), or a structural property of the data structure (e.g., the order of elements of a particular list or the number of elements contained in a particular set). In accordance with further example implementations, other types of conflict may be detected. A conflict may arise due to the existence of an
unpublished value associated with the view associated with the publishing isolation context, and a conflict arise due to the existence of an unpublished value associated with an ancestor view of the view associated with the publishing isolation context. An unpublished value may be a value asserted to be associated with a view that has not yet been processed in response to a publication of an isolation context associated with the view. More specifically, a conflict may arise when the value associated with such a property is changed within the parent isolation context 1 10 of the isolation context 1 10 requesting publication. A conflict may arise when a new value otherwise becomes visible in the parent isolation context 1 10, when such a value change occurs after a particular effective time and when the child (e.g., publishing) isolation context either modified the same property (where such modification includes receiving a value due to the successful publication by a further child isolation context 1 10 of the publishing isolation context 1 10) or when the publishing isolation context performs a frozen read operation to obtain a value associated with the property. The effective time associated with a property relative to the publishing isolation context 1 10 may be established and updated upon creation of the publishing isolation context 1 10, each time the publishing isolation context 1 10 is successfully published, and upon an explicit indication (e.g., during an attempt to resolve conflicts due to an unsuccessful publication attempt) by program code 1 12 executing in the publishing isolation context 1 10 that any conflicts associated with the property have been resolved. In accordance with an example implementation, the effective time may be updated when the publishing isolation context 1 10 is established as the prevailing isolation context 1 10. When the publishing isolation context 1 10 is a live (i.e., non-snapshot) isolation context 1 10, the effective time is also updated the first time a modification to or frozen read of the property is performed following such a creation, successful publication, or explicit indication. The result of these rules is that for a live (i.e., non-snapshot) isolation context 1 10, a conflict can only arise due to a modification in the parent isolation context 1 10 following a modification or frozen read to the same property in the child isolation context 1 10, while for a snapshot isolation context 1 10 the order of the operations in the parent and child isolation context is immaterial.
[0043] In accordance with example implementations, if there are no conflicts, the publication succeeds, and all of the changes made in the publishing context atomically become visible in the parent context. In accordance with example implementations, the entire publication process is atomic. In general, an "atomic" process means that the actions that form the process are treated as being indivisible, i.e., the actions are viewed or treated by the isolation contexts as occurring at the same time. In accordance with example implementations, for an atomic publication process, a thread cannot see some, but not all, of the published modifications in the parent isolation context, and a modification cannot be made in any isolation context that would change the determination that there are no conflicts, between the determination that there are no conflicts and the making available of the
modifications in the parent isolation context. As described further herein, in
accordance with example implementations, the conflict(s) may be determined proactively so that the conflicts are known at the time the isolation context 1 10 requests publication.
[0044] When an isolation context 1 10 successfully publishes, the isolation context 1 10 and its parent isolation context 1 10 present the same values for the data structure 106, and the child isolation context 1 10 is no longer considered to have established any current values for the data structure 106. As described above, attempts to determine values for properties of the data structure 106 within the child isolation context 1 10 results in determining inherited values from the parent isolation context 1 10. In a live isolation context 1 10, this has the effect that locations that had been "frozen" by modifications or by frozen reads are no longer considered frozen and the inherited values may vary as modifications are made to the parent context; and for a snapshot isolation context 1 10, the "snapshot time" (i.e. the effective time used to find inherited values from the parent isolation context 1 10) is updated to the publication time. In some implementations, a last-common-snapshot for a given isolation context 1 10 may be obtained by calling the appropriate function 136. The last-common-snapshot is a read-only snapshot child of the context's parent as of the last time the isolation context 1 10 was published (or its creation time if it has not been published). This allows an isolation context 1 10 to compare its changes to a data structure 106 with the values of the data structure 106 when the isolation context 1 10 started or was last successfully published. In accordance with example implementations, an as-created-snapshot for a given isolation context 1 10 (a snapshot going back to the beginning, or creation, of the snapshot) may be obtained by calling the appropriate function 136.
[0045] If there are one or multiple conflicts, then publication of the child isolation context 1 10 does not occur, and all of the changes, or modifications, that are made by the child isolation context 1 10 remain invisible to the parent isolation context 1 10.
[0046] In accordance with example implementations, to resolve conflicts, a conflict resolution phase is entered for purposes of taking one or multiple actions to resolve the conflicts. The conflict resolution phase may be handled in a number of different ways, depending on the particular implementation.
[0047] In accordance with some implementations, the program code 1 12 associated with the context 1 10 attempting publication decides how to handle the conflicts and may decide to re-attempt the publication. In accordance with example
implementations, before such an attempt may be successful, the program code 1 12 marks each conflict as being resolved so that the conflict does not show up again for a subsequent publication attempt.
[0048] In accordance with example implementations, a given conflict may be resolved by modifying a value of the property associated with the conflict and specifying that such a modification resolves any conflict for that property. In accordance with example implementations, the modification may be made by computing and setting a specific value. In accordance with further example implementations, the modification may be one of the following. The modification may be a resolve-to-current modification in which the value in the publishing context is the one that is used. This is often the correct answer when it can be determined that the value asserted by the child isolation context 1 10 is not dependent on any conflicted value (i.e., if the program code 1 12 of the publishing isolation context 1 10 was executed again, the same result would occur). The modification may be a resolve-to- parent modification in which the current value in the parent isolation context 1 10 is the one that is used. The modification may be a roll-back modification in which the value in the last common snapshot is the one that is used.
[0049] In accordance with example implementations, determining a value for a property associated with a data structure includes determining a lack of an
established value associated with the property in the view of the child isolation context 1 10; determining an inherited value for the property, where the inherited value is the value for the property in the view of the parent isolation context as of an effective time associated with the child's view and the request time; and considering the inherited value to be the current value. For a live child isolation context 1 10, the effective time may be the request time; and identifying the conflict includes
identifying a conflict associated with the property due to the establishment of a value associated with the property in the parent's view subsequent to the establishment of a value associated with the property in the child's view. For a snapshot child isolation context 1 10, the effective time may be the latter of a creation time of the first isolation context and a publication time of the first isolation context prior to the request time; and identifying the conflict includes identifying a conflict associated with the property due to the establishment of a value associated with the property in the parent's view subsequent to the latter of the creation time of the first isolation context and the latest publication time of the first isolation context.
[0050] It is noted that a given location may be subsequently modified (after having been marked as resolved) before the publication is reattempted. Moreover, marking a conflict as being resolved may cause the system to disregard any known conflict associated with the location and the isolation context 1 10, but one or multiple subsequent modifications in the parent or child isolation contexts 1 10 may introduce one or multiple new conflicts, which arise when the isolation context 1 10 tries again to publish.
[0051 ] In an example implementation, program code 1 12 performing conflict resolution may make use of the current (e.g., publishing) isolation context 1 10, the parent isolation context 1 10, and the isolation context's 1 10 last-common-snapshot isolation context 1 10. In addition, the program code 1 12 may also make use of a current-at-publish isolation context 1 10 and a parent-at-publish isolation context 1 10, which are read-only snapshots of the current (publishing) context and its parent at the time the publication was attempted (and failed).
[0052] To simplify the process of running code in isolation, in accordance with example implementations, one or multiple mechanisms may be used to encapsulate the process of creating an isolation context 1 10 as a child of the prevailing isolation context 1 10; running the code in it, and (when the created isolation context 1 10 is publishable) attempting to publish the created isolation context 1 10 at the end. In accordance with example implementations, the mechanism may involve keywords, annotations, or other syntactic additions to the source code of the program and the program code may be specified directly. In accordance with further example implementations, the mechanism may involve calling a function (e.g., one of the functions 136 of Fig. 1A) and passing as an argument an indication of a function to be called within the newly created isolation context 1 10. In accordance with some example implementations, the program code 1 12 may specify (e.g., by choice of keyword, annotation, or function or by passing in a parameter) the type of child isolation context 1 10 to create (e.g., live or snapshot, detached or not, read-only or not). In accordance with some example implementations, the program code 1 12 may further specify information used to control behavior upon failure of an attempt to publish. Such information may cause the system to attempt conflict resolution and may control the manner in which conflict resolution is performed. The information may alternatively or in addition direct the system to react to a failure to publish or a failure to resolve conflicts by creating a new child isolation context 1 10 and rerunning the code within it. The information may include one or more termination conditions, which the system may use to determine that further attempts to perform conflict resolution or to rerun the code should not occur. Examples of such termination conditions may include a given number of attempts having been performed, a given time (e.g., a wall clock time or a time duration) having been passed, a given amount of a resource (e.g., disk space or memory) having been consumed, a value having been asserted by another program thread (e.g., an indication that a solution to a problem has been found by another thread or an indication that a program has gone on to a different phase), a given function returning a true value, a Boolean expression evaluating to a true value, or another termination condition.
[0053] In accordance with example implementations, the conflict resolution may be handled in a number of different ways. For example, in a first conflict resolution approach, the program code 1 12 may decide that resolving the conflict is not worth the effort, and as a result, the isolation context 1 10 does not attempt to republish. For example, the program code 1 12 may be relatively small (online transaction processing (OLTP) code, for example); and as such, the approach may be to simply throw away the associated isolation context 1 10, create a new isolation context 1 10 and execute the program code 1 12 again from the beginning.
[0054] In accordance with example implementations, the conflict resolution may be handled by a task-based conflict resolver engine 140, which may be associated (as examples) with an object or a function 136. In accordance with example
implementations, a particular conflict resolver engine 140 may be designated at the time the program code 1 12 modified the location that was conflicted, essentially specifying, for example, that if the value associated with the location gets a conflict, the particular conflict resolver engine 140 is invoked to resolve the conflict.
[0055] In accordance with example implementations, the program code 1 12 may specify a default resolver (e.g., an object, such as a conflict resolver engine 140, or function) that is associated with a particular location (e.g., a particular field in a particular record), with a particular field (regardless of what record a conflict occurs in), or a type (e.g., applying to conflicts associated with any property of any data structure as long as the property is associated with values of the given type or applying to conflicts associated with any field of any record as long as the record is of the given type). This type of conflict resolution may be appropriate when a field is, for example, known to be used as a counter. For example, if the value in the last common snapshot is "x," the value in the parent is "y," and the value in the child is "z," then a reasonable resolution to have the value be set to x+y-z, and a resolver may be attached to the field to perform this computation and resolve the conflict by specifying the resulting value.
[0056] In accordance with some implementations, as further described herein, the program code 1 12 may use task-based conflict resolution. With task-based conflict resolution, the program code 1 12 specifies that some or all of its computation is made up of re-runnable tasks that are executed by the program code 1 12. In this manner, as the program code 1 12 executes, the system 100 may keep track of the set of locations read and written while working in each task and the dependencies between tasks (e.g., a dependency create where one task reads a location that was written by another task). In accordance with example implementations, to resolve conflicts with task-based conflict resolution, all tasks that read a conflicting location are selected to rerun, as are all tasks dependent on them (and so on, recursively). Conflicted locations may be resolved to the parent value, the current value, or determined value prior to, during, or after selected tasks are re-run. The selected tasks may then re-run in dependency order and the publication may then be retried.
[0057] In accordance with some implementations, the program code 1 12 may provide a function that takes a collection of conflict objects and runs arbitrary code to determine how to resolve the conflicts explicitly.
[0058] In accordance with example implementations, the program code 1 12 may use a combination of the above-described approaches, even for a single publication attempt. For example, the program code 1 12 may use field-based rules to resolve (and eliminate) some conflicts and then invoke one or multiple conflict resolver engines 140 to resolve the remaining conflicts.
[0059] Thus, referring to Fig. 4A in conjunction with Fig. 1A, in accordance with example implementations, a technique 400 includes preventing (block 404) a first modification made to a data structure within a first isolation context from being viewed by a second isolation context prior to publication of the first isolation context. In response to a first attempt to publish the first isolation context, the technique 400 includes managing (block 406) the publication of the first isolation context, which includes identifying (block 408) a conflict that prevents the first publication attempt; making (block 412) a second modification within the first isolation context to resolve the conflict; and publishing (block 416) the first isolation context in connection with a subsequent publication attempt, including allowing the second isolation context to view a result of the first and second modifications, as of a time associated with the subsequent publication attempt.
[0060] In accordance with example implementations, several attempts may be made to publish a given isolation context. In this manner, prior to making the subsequent publication attempt, the following may occur: an intermediate attempt may be made to publish the first isolation context; at least one new conflict may be identified that prevents the intermediate publication attempt, wherein the new conflict(s) arise from modifications to the data structure performed in the second isolation context subsequent to the first attempt; and action may be taken to resolve the new conflict(s). More specifically, referring to Fig. 4B, in accordance with example implementations, a technique 430 includes executing (block 434) machine executable instructions in a processor-based machine in a first isolation context and in a second isolation context. The first isolation context presents an associated first view of a data structure, and the second isolation context presents and associated second view of the data structure. Executing the machine executable instructions includes inhibiting (preventing, for example) modifications that are made to the first object in the first isolation context from being reflected in the second view. Here, a modification made in the first view "being reflected" in the second view refers to the modification being reproduced or shown in the second view.
[0061 ] Pursuant to the technique 430, an attempt is made (block 438) to publish the first isolation context to reflect modifications made to the data structure in the first isolation context in the second view of the data structure. If a determination is made (decision block 442) that one or more conflicts exist, then a decision is made (decision block 444) whether a termination condition has been satisfied (i.e., a decision to abandon the publication has been reached). If not, one or more actions are taken to resolve the conflict(s) pursuant to block 446 and control returns to block 438. When no conflicts remain, the technique 430 includes publishing (block 450) the first isolation context, including causing the second view to reflect modifications made to the data structure within the first isolation context. These modifications include modifications made during action(s) taken to resolve conflict(s), as of a time associated with the last attempted publication attempt. In accordance with example implementations, the conflicts may be resolved using task-based conflict resolution.
[0062] Referring to Fig. 5 in conjunction with Fig. 1A, in this manner, in accordance with example implementations, a technique 500 includes executing (block 504) machine executable instructions in a first isolation context to run a plurality of tasks to modify data according to a first view that is associated with a first isolation context. Pursuant to the technique 500, an attempt is made (block 508) to publish
modifications made in the first isolation context so that the modifications are visible in a second isolation context that having an associated second view, which is isolated from the first view. Pursuant to the technique 500, in response to the attempted publication, task-based conflict resolution is applied (block 512) to resolve at least one conflict occurring due to the first and second views. Applying the task-based conflict resolution includes executing machine executable instructions to selectively re-run a subset of the tasks of the plurality of tasks, wherein the number of tasks of the subset of tasks is less than the number of tasks of the plurality of tasks.
[0063] Referring to Fig. 1A, the portions, or units, of the program code 1 12 that are considered to be "tasks" may be specified in a number of different ways. For example, in accordance with some implementations, the program code 1 12 may expressly identify one or more portions as being re-runnable tasks. For example, a particular function 136, a task function fn, may be passed to a task execution function (i.e., runTask(fn)). The resulting portion of the program code 1 12 may, for example, be the following : runTask( () -> {
doSomethingO;
doSomethingElse();
});
TABLE 1
[0064] When the task execution function is called, the task function passed in is executed as a task, thereby becoming a candidate to be re-run if needed during conflict resolution. The same task function may be used to generate several different tasks in the same isolation context 1 10, and because they are executed at different times, even multiple invocations of the same function may have different observed behaviors. The separate tasks associated with the same task functions become separate candidates to be re-run and may be selected or not selected independently. In accordance with example implementations, the task execution function may also accept other arguments to be supplied to the task function during its execution. This arguments may be associated with the resulting task and provided when the task function is re-run.
[0065] In accordance with example implementations, the program code 1 12 may specify that during an iterative construct, each separate iteration (or each subset of some partitioning of the iterations) should be executed as a separate task. An iterative construct may include an iterative control structure provided by a
programming language (e.g., a while loop or for loop) or mapping a function over a data stream or over the elements of a data structure. Iteration information (e.g., the element of the data stream or data structure, an index that allows retrieval of the element, or other information used to control the loop) may be associated with the resulting task. This allows a computation to examine and possibly modify each element of a large data structure, while limiting rework to those elements that are affected by conflicts.
[0066] In accordance with example implementations, functions 136 may be used to specify that a number of tasks may be run in parallel or constrain the tasks to be run in parallel. Moreover, in accordance with example implementations, a function 136 may specify that a given task is to be run as a parallel sibling of the current task.
[0067] In accordance with further example implementations, other ways may be used to specify tasks. For example program code 1 12 may specify that a particular function has the property that any execution of the function should be considered to be a separate task. For example, a Java language annotation, (e.g., @RunAsTask annotation) may be used. As another example, a programming language may provide a keyword that, when used, declares that a following block or expression is to be run as a task. As other examples, a new keyword to a programming language, such as C++, may be used, via amendments to the programming language standard or by modifying a compiler to recognize the keyword. As another example, the program code 1 12 may be analyzed, either by the compiler or through the use of a standalone tool for purposes of determining that particular sections of code may beneficially be treated as tasks. For example, a task creator 142 (see Fig. 1 B) of a conflict resolver engine 140 may determine that particular sections of program code 1 12 should be treated as tasks. As another example, program code 1 12 may specify that one or more methods of a class are to be treated as separate tasks when invoked by declaring that class to derive or extend, directly or indirectly, from a particular ancestor class or to implement a particular interface.
[0068] In accordance with example implementations, the tasks may be hierarchically arranged. That is, a task P may have n child tasks Clt ... , Cn, and it may also have code that runs outside of any child task. These child tasks may have further grandchild tasks and so on, to any depth.
[0069] In accordance with example implementations, there may be other
mechanisms to specify that a collection of tasks are to be run sequentially or in parallel. In some implementations, a task executor may be used to allow tasks (tasks generated by a task factory, for exampleO to be executed by a fixed or dynamic collection of worker threads.
[0070] As the program code 1 12 executes, there may be a current or prevailing task for each thread. As an example, the current task may be obtained by being associated with the current isolation context as a thread-local member (i.e., a field of an object representing the current isolation context having independent values associated with each executing program thread). When a new thread is created, the new thread the initial value of the current task for the new thread may be the current task from the thread that created the new thread at the time of the new thread's creation.
[0071 ] When the program code 1 12 specifies to run a function as a task, a new task (e.g., an object representing a new task) may be created as a child of the current task and set to be the current task. When the function exits, the current task may revert to the prior current task. This may be accomplished in program code 1 12 written in the Java language by means of a try-with-resources block specified to create an object that remembers the current task prior to executing the new task and resets the current task to be the remembered task in its close method. In program code 1 12 written in the C++ programming language, reverting to the prior current task may be accomplished by means of an object on the stack whose constructor remembers the current task prior to executing the new task and whose destructor resets the current task to the remembered task.
[0072] In accordance with example implementations, when a new isolation context 1 10 is created, the isolation context 1 10 is not associated with any task for any thread. In accordance with further example implementations, a new task may be created for each isolation context at isolation context creation and upon successful publication to be the default task for all threads. In accordance with some
implementations, whether a default task is created is a user-configurable part of the system, either globally or on a per-isolation-context basis. This allows programs which do not use task-based conflict resolution to not incur the overhead of monitoring reads and writes. In accordance with example implementations, when an isolation context 1 10 set to be the current isolation context 1 10, the current task is now the one associated with that context and the current thread. If current isolation context 1 10 reverts to the prior isolation context 1 10, the current task reverts to the one for the current thread in the prior isolation context 1 10. When an isolation context 1 10 is successfully published, all of its tasks are discarded, and its mapping from threads to current tasks is cleared. In an alternative implementation, upon successful publication, it is set to map all threads to a single newly-created task.
[0073] In accordance with example implementations, task execution is monitored for purposes of identifying tasks that may be involved with conflicts. In this manner, in accordance with example implementations, as the program code 1 12 executes, when the code 1 12 interacts with a location that may be involved in a conflict (e.g., a field of a record, a name in a namespace, a slot of an array, or a mapping of a key in an associative structure such as a map or set), the code 1 12 calls a predefined function 136 of the library 134. When such a call occurs, if there is a current task, the function 136 logs information about the interaction and the location in (or associated with) the task.
[0074] As a more specific example, in accordance with some implementations, a task monitor 144 (see Fig. 1 B) of a conflict resolver engine 140 may log the location and an indication of whether the operation was a read or a write. In the case of a modification operation, such as an increment, the operation is logged as being both a read and a write. In accordance with some implementations, the log distinguishes between a frozen read, which guarantees that subsequent reads (without an intervening modification) will return the same value and which may induce conflicts, and a free read which does not make that guarantee and which does not induce conflicts. In accordance with example implementations, the task monitor 144 maintains separate logs for reads and writes for each task.
[0075] In some implementations, the log may include a timestamp that allows the use of the relative order of writes and reads to the same location in different tasks to infer dependencies between the tasks. In accordance with a further example
implementation, the task monitor 144 may not log a timestamp, and the task monitor 144 may log a dependency between the current task and the task, which last modified the location may be logged when a read to a location is logged. In such implementations, logged dependencies may be restricted to those between the tasks associated with the same isolation context 1 10. In accordance with some
implementations, the task monitor 144 may maintain a single log for all tasks relating to a given isolation context 1 10. In such implementations, timestamps may not be used, and relative ordering between reads and writes may be inferred from the ordering of entries in the log. Moreover, in these implementations, the task monitor 144 may record the task that is associated with each operation along with other information about the operation. It is noted that the order in the log is maintained to reflect the actual order of the operations with respect to the managed data.
[0076] In accordance with some implementations, the read logs and write logs are kept as a mapping from numbers identifying records and namespaces to sets of locations (fields in that record or names in that namespace).
[0077] In accordance with further example implementations, the task monitor 144 may log conflicting operations other than those on specific locations including, for example, conflicts relating to the size of a map, conflicts relating to the presence or absence of a value in a set, or conflicts relating to the length or ordering of elements in a list.
[0078] Referring to Fig. 6, in accordance with example implementations, performing the task-based conflict resolution includes a technique 600 of monitoring (block 604) the execution of instructions that run the plurality of tasks for purposes of
determining a dependency graph among the tasks; and identifying (block 608) a subset of the tasks (i.e., a subset of the dependency graph) to be re-run based at least in part on the determined dependency. Example implementations of ways to construct the dependency graph and identify the subset set of tasks to be re-run are discussed below.
[0079] As discussed herein, the monitoring of the tasks includes observing the instructions, or code, being executed in these tasks as reading and writing locations. In accordance with the following discussion, "a task writing to a location" refers to observing program code executing in the task writing to the location; and "a task reading from a location" refers to observing program code executing in the task reading from the location.
[0080] In accordance with some implementations, a graph generator 149 (see Fig. 1 B) of a conflict resolver engine 140 may build up a dependency graph between tasks as the program code 1 12 executes. In accordance with some implementations, the graph generator 149 maintains, for each task a dynamically built collection of tasks that are dependent on the task, and uses these dynamically built collections of tasks in the building of the dependency graph when used for conflict resolution.
[0081 ] In accordance with example implementations, specific write operations within a task may establish value that are dependent on factors other than those obtained via monitored reads within the task where those factors might be expected to be saliently different if the task were re-run. For example, a written value may be dependent on a value read from a clock provided by the system 100, on a value read from a sensor, or on the value of a program variable outside the scope of the task monitoring (e.g., a value outside the managed space). In such implementations, the program code 1 12 may indicate that the current task or a task to be re-run should be re-run unconditionally upon the decision to re-run any tasks. This specification may take the form of a function call, a call to a method on the task about which the assertion applies, or a special form of a write to a field (e.g., a volatileWrite() call). In example implementations, it may be specified as a property of the task object upon creation of the task object or specified as an option to the task invocation.
[0082] When an attempt to publish fails due to the presence of conflicts, a task selector 146 (see Fig. 1 B) has a list of conflicts and associated locations at which the conflicts occurred. The task selector 146 resolves these conflicts by selecting tasks to be re-run to recalculate the final values of conflicted locations and/or asserting final values of conflicted locations. The tasks are scheduled to be re-run by a task scheduler 148 (see Fig. 1 B).
[0083] In accordance with example implementations, the task selector 146 identifies as tasks to re-run those tasks that are identified as directly or indirectly dependent on conflicting writes in the parent isolation context and re-executes these tasks after resolving the responsible conflicting write locations to their respective values in the parent isolation context 1 10. Where a conflicting write location has not been read by the child isolation context but is written by the child isolation context, the conflict resolver engine 140 resolves the conflicting write location to the final value of that location in the child isolation context, ensuring that upon successful publication of the child isolation context 1 10, the value will be observed as ordered after the committed write in the parent isolation context. Thus, the task selector 146 identifies a subset, or subgraph, of the task dependency graph to be re-run; and this subset identifies a subset of tasks of the plurality of tasks that have been executed by the isolation context 1 10.
[0084] In accordance with some implementations, the graph generator 149 (Fig. 1 B) may construct a sub-graph of the task dependency graph, identifying tasks and dependency relations from the dependency graph to use in scheduling tasks to rerun, prior to the re-running of any tasks. Such a sub-graph may be considered a static dependency graph. A static dependency graph may differ from an optimal sub-graph to re-run in that it may incorrectly include or exclude tasks or dependency relations. The graph generator 149 may construct the static dependency graph as follows, in accordance with example implementations.
[0085] The static dependency graph may be constrained to include tasks designated to be unconditionally re-run.
[0086] The static dependency graph may be constrained to include a first task when it includes a second task and the first task has been identified as explicitly dependent on the second task.
[0087] The static dependency graph may be constrained to include a task that was determined to have read a value from a conflicting write location, where the value first read by the task was not written by another task associated with the current isolation context. In some implementations, qualifying read operations are restricted to frozen read operations.
[0088] The static dependency graph may be constrained to include a first task when it includes a second task and the first task was determined to have read a value from a location, where the value was written by the second task. The static dependency graph may further be constrained to include a dependency between the first and second tasks such that the second task is constrained to be re-run after the first task is re-run. [0089] The static dependency graph may be constrained to include a first task when it includes a second task and the first task was determined to have written a value to a location, where the value was read by the second task and either the first task was not the last writer of the location in the publishing context prior to the publication attempt or the location is a conflicted write location.
[0090] The static dependency graph may be constrained to not include both a task and a direct or indirect sub-task of the task. If a task that has tub-tasks is selected to be included in the static dependency graph, its reads and writes may be considered to include all reads and writes of all of its direct and indirect sub-tasks.
[0091 ] The static dependency graph may include a constructed final assignment task. The static dependency graph may include dependency relationships between this task and other included tasks constraining the final assignment task to be run after all other identified tasks. The final assignment task may include program code 1 12 to re-assert values for locations whose value at the time of the publish attempt was written by a task not included in the static dependency graph but which was earlier written by a task included in the static dependency graph. The value to be reasserted by the final assignment task is the value at the time of the publish attempt.
[0092] In accordance with example implementations in which program code 1 12 is allowed to read or write locations when there is no current task, the observation of such reads or writes may constrain the conflict resolver engine 140 to determine that a static dependency graph cannot be constructed. In accordance with example implementations, the conflict resolver engine 140 may be constrained to determine that task-based conflict resolution not be used to resolve conflicts when a conflicting write location was observed to have been read outside of any task, when a value written to a location in the child isolation context was observed to have been read outside of any task, or when program code 1 12 executing outside of any tasks takes an action which would cause the conflict resolver engine 140 to determine that the current task, had there been one, would have been marked to be unconditionally rerun.
[0093] In accordance with some implementations, the static dependency graph may include other tasks and dependencies implying more ordering constraints. In some implementations the static dependency graph may include all tasks that read or wrote any conflicted location. In some implementations, the static dependency graph my include a first task if it includes a second task when it is determined that the first task read a location previously written by the second task, without respect to whether the value read by the first task was the value written by the second task or by another task, and the first task may be constrained to be re-run after the second task.
[0094] The static dependency graph my include dependency relationships between two tasks that imply the requirement that each be re-run before the other. In some example implementations, the task scheduler 148 may choose to schedule the two tasks to be re-run concurrently with one another. In some implementations, the task monitor 144 may monitor the temporal order of launch and completion of tasks, and the task scheduler 148 may use this information in determining a correct ordering and allowed concurrency during re-run. In some implementations, the program code 1 12 may be able to specify explicitly that certain tasks must, may, or may not be run concurrently with one another. This may be accomplished by having a multi-argument task execution function that takes as arguments multiple tasks (or sufficient information to create multiple tasks) to run in parallel, by having a version of the task execution function that runs a task as a parallel sibling of the current task (rather than as a child), by providing a mechanism for a task to indicate (e.g., as a property of the task object or by calling a function as it runs) that other tasks that happen to be running concurrently are constrained to run concurrently, or by other way.
[0095] In accordance with example some implementations, the program code 1 12 may specify explicitly that certain tasks are to be run before or after certain other tasks; or that task A depends on task B such that if task B is re-run, task A is to be rerun. It is noted that task dependency may be independent of order.
[0096] It is possible that when an attempt to publish is made, other threads are also working in the same isolation contexts (in the same task or different tasks). In accordance with example implementations, the system 100 may cause these other threads to pause execution (e.g., immediately prior to the next read or modification operation), and, upon determining that conflicts have been detected, may ensure that any unfinished tasks are selected to be re-run and that threads executing in these tasks are interrupted (e.g., by an exception thrown by the function in which they were paused).
[0097] In accordance with example implementations, the conflict resolver engine 140 resets the values at locations corresponding to the conflicts prior to the re-running of the tasks. In this manner, referring to Fig. 7, in accordance with some
implementations, task-based conflict resolution connection with task-based conflict resolution includes determining (block 704) a subset of locations to reset to store values shared in common among first and views; resetting (block 708) the identified locations to store values in the locations; and re-running (block 712) the tasks of the subset.
[0098] To allow a subsequent attempt to publish the isolation context 1 10 following the re-run of selected tasks and to prepare for the re-run of selected tasks, the conflict resolver engine 140 may identify and reset (e.g., by modifying) some locations (whether such locations are associated with conflicts preventing publication of the isolation context 1 10 or not), specifying that such resetting is to resolve any conflict associated with the location. This removes such a conflict from the collection of conflicts preventing the isolation context 1 10 from publishing, but the system 100 may detect subsequent conflicts at a reset location between the time its associated conflict is noted as resolved and the time the system 100 attempts to re-publish the isolation context 1 10. In accordance with an example implementation, the conflict resolver engine 140 may identify locations (whether or not conflicted) noted as having been written by a task selected to be re-run (except for the final assignment task) and may reset such locations to the values they currently have in the parent isolation context. In accordance with an example implementation, the conflict resolver engine 140 may identify conflicted locations not noted as having been written by any task selected to be re-run and noted as having been read by some task selected to be re-run and may reset such location to the values they currently have in the parent isolation context. In accordance with an example implementation the conflict resolver engine 140 may identify conflicted locations not identified to be handled in another manner and may reset such locations to the value they currently have in the publishing isolation context 1 10. Such resetting to the current value may not involve actual modification of the value. In accordance with an example implementation, locations to be written by a final assignment task are not reset prior to task re-run.
[0099] The tasks may be re-run after the resetting of values. The tasks may be scheduled to re-run in any order that is consistent with identified ordering
dependencies (e.g., as captured in the dependency graph). In accordance with example embodiments, the task scheduler 148 may schedule tasks to re-run based in part on an observed relative temporal order of tasks during the initial run or a prior re-run. In accordance with example embodiments, the task scheduler 148 may schedule tasks to be re-run concurrently based on observed concurrency in the initial run, on inferences from the task dependency graph, on allowed concurrency information specified by the program code 1 12, or on other grounds.
[00100] In accordance with example implementations, the task scheduler 148 may schedule a final assignment task to be run, after all other tasks have been re-run. This final task modifies remembered locations to remembered associated values, specifying that such modifications are to resolve any conflict associated with the location. In accordance with some example implementations, the final assignment task may refrain from modifying a location if it is noted that it has not been previously written by any re-run task. In such example implementations, the final assignment task may instruct the system 100 to consider any conflict at such a location to be resolved. In accordance with some example implementations, a final assignment task may be run prior to the completion of the re-run of all other tasks selected to be re-run. In accordance with some example implementations, there may be more than one final assignment task.
[00101 ] In accordance with further example implementations, the graph generator 149 may construct the dependency graph, and task scheduler 148 may traverse, or walk, the dependency graph as tasks are being run, the task scheduler 148 deciding on later tasks to run based on behavior observed by the task monitor 144 during the re-run of earlier tasks. More specifically, the graph generator 149 may construct, based on task observations, a dependency graph, in a manner similar to that described above, but including information about observed temporal ordering between tasks. As before, some tasks may have been indicated as unconditionally to be re-run.
[00102] The task scheduler 148 may iteratively process the graph, considering, processing, and removing tasks associated with root nodes from the graph. Such root nodes are known to not be dependent on any tasks remaining in the graph. The task re-run may be considered complete when every node has been processed and re-run from the task. Root nodes may be considered singly or multiply.
[00103] In accordance with an example implementation, if the task associated with a root node under consideration is specified to be unconditionally re-run, the system 100 schedules the task to be re-run.
[00104] In accordance with an example implementation, if the task associated with a root node under consideration was noted to have read any conflicted locations or any locations written by any task during re-run, the system 100 schedules the task to be re-run.
[00105] In accordance with an example implementation, if the task associated with a root node under consideration is not otherwise determinable to be scheduled to rerun, the task scheduler 148 may examine the set of values written by the task to determine whether the conflict resolver engine 140 has sufficient information to reproduce the effects of re-running the task without actually re-running the task. If the task scheduler 148 determines that the conflict resolver engine 140 has sufficient information, then the conflict resolver engine 140 modifies locations written by the task to simulate re-run. In such an example, if a location is a conflicted location, the conflict resolver engine 140 may note the associated conflict as resolved. If the conflict resolver engine 140 does not determine that it has sufficient implementation, the conflict resolver engine 140 schedules the task to be re-run. In accordance with an example embodiment, writes to locations for which the task was noted to be the last writer may be simulated by allowing them to retain their current values. In accordance with an example implementation, writes to locations for which the task was not noted to be the last writer and for which the last value written by the task was logged may be simulated by writing the logged value to the location. In accordance with an example implementation, writes to locations noted to have been modified during re-run may be simulated by allowing them to retain their current values following a determination that the there are no dependencies that would have precluded the task under consideration from having been scheduled before or concurrently with the task that last wrote to the location.
[00106] In accordance with an example implementation, during task re-run, the first time a conflicted location is read, the conflict resolver engine 140 may note the conflict associated with the location as being resolved to the current value in the parent isolation context 1 10. In accordance with an example implementation, during task re-run, the first time a conflicted location is written, the conflict resolver engine 140 may note the conflict associated with the location as being resolved to the value being written. In accordance with an example implementation, the graph generator 149 may modify the dependency graph based on writes to locations observed during task re-run. In accordance with an example implementation, the graph generator 149 may remove tasks that have completed re-run from the graph.
[00107] In accordance with example implementations, after all of the tasks have been re-run, publication of the isolation context 1 10 is attempted again. If the publication attempt fails, a further round of conflict resolution may be attempted. In this case, the (new) tasks executed during the re-run take the place of those selected to be re-run alongside of the old tasks which were not selected to be re-run. A new subgraph is extracted for re-run after resetting locations.
[00108] In accordance with example implementations, the cycle of attempting publication, and resolving conflicts may be performed a bounded number of times or until some other termination criterion is reached (e.g., a particular amount of time has elapsed, a particular wall-clock time has been reached, a termination flag has been set from outside, or a termination function returns a particular value). If a termination criterion is reached before the publish is successful, the request to resolve conflicts is declared to have failed. In this case, a new isolation context may be established and the entire computation retried (perhaps with conflict resolution), repeating based on a (perhaps different) set of termination criteria, before declaring that the attempt to publish has failed.
[00109] Thus, referring to Fig. 8, in accordance with example implementations, a system 800 to perform task-based conflict resolution includes a processor-based conflict resolver engine 820 (such as the conflict resolver engine 140, for example) that monitors the execution of a set 808 of tasks 810 associated with an isolation context for purposes of constructing a dependency graph 822, which describes the interdependencies of the tasks 810. In response to an attempted publication of the isolation context, the conflict resolver 820 resolves conflicts 821 (represented by data stored in a memory 823, for example) that prevent publication of the isolation context by selecting a subset 828 of the tasks 810. In this manner, the subset 826 contains a fewer number of tasks 810 than the set 808; and the tasks 810 of the subset 826 are re-run before publication is re-attempted.
[001 10] In accordance with example implementations that are described herein, properties of a data structure may be represented by multiple simultaneous value (MSV) objects. In general, an MSV object represents a property of a unit of data and may concurrently, or simultaneously, have multiple, alternative values. For example, a particular logical location, such as a field of a record or a slot in an array, may be represented by an MSV object, such that the field or slot has different, alternative values, when seen through different views.
[001 1 1 ] A given isolation context may be associated with one or multiple views that may be used when accessing one or multiple MSV objects. Moreover, multiple isolation contexts may be associated with multiple views of a given MSV object. The values associated with different views in a given MSV object may be isolated from each other. In this manner, as part of this isolation, a request to determine a value for the MSV object in a given view may result in the return of one of the MSV object's alternative values; and assigning the value of the MSV object in a given view may not affect the value of the MSV object in another view.
[001 12] Referring to Fig. 1 C in conjunction with Fig. 1A, in accordance with example implementations, the system 100 may use various data structures 150 to manage the MSV objects. In accordance with example implementations, a given data structurel 50 may be an object represented by a C++ class.
[001 13] The data structures 150 may include isolation context objects 151 . Each isolation context object 151 corresponds to one of the isolation contexts 1 10. It is noted that in the following discussion, "isolation context object 151 " and "isolation context 1 10" may be used interchangeably. Conflicts 152 that prevent publication of the isolation context 1 10 may be installed on a corresponding isolation context state 157.
[001 14] The data structures 150 may include view objects 153. Each view object 153 may correspond to one of the views 120. It is noted that in the following discussion, "view object 153" and "view 120" may be used interchangeably.
[001 15] The data structures 150 may include one or multiple conflict generators 160. In general, a conflict generator 160 may construct conflicts referring to a particular location. Each subtype of conflict may have its own generator subtype, in accordance with example implementations.
[001 16] The data structures 150 may include context states 157. In general, a context state 157 represents the current set of conflicts and "contingent conflicts" for an associated isolation context 1 10, as well as a count or event time, representing the last time the isolation context associated with the conflict successfully published.
[001 17] The data structures 150 may include an event counter 164, which may represent a monotonically-increasing set of values that are incremented by particular events. In accordance with some implementations, the event counter 164 may represent a global, shared event count, which is atomically incremented. In accordance with example implementations, the constant representing the greatest possible value of an event counter may be used to represent the most recent event count. Logically, values of the event counter 164 may be associated with snapshot creation times, in accordance with example implementations. Values read from the event counter 164 may be considered to be timestamps denoting points in type of the execution of system 100, and the value stored in the event counter 164 may be considered to be the current time (or timestamp) of the system 100 with respect to the context and MSV object management resources 130. It is noted that save that their values monotonically increase over time, meaning that timestamps may be compared with one another, these timestamps have no necessary relationship with any other notion of time (e.g. wall-clock time or time since the start of an operating system or process in system 100). It is also noted that different reads of the event counter 164 may correctly read the same timestamp value, but consecutive reads of the event counter 164 may never read an earlier timestamp after reading a later timestamp.
[001 18] The data structures 150 may represent value chains 156, value objects 158 and MSV objects 166. Each value chain 156 may be associated with a particular view object 153 (representing a view 120) that may be use when accessing an MSV object 166; and the value chain 156 represents a history of value objects 158 (for a particular MSV object 166 noted in the given view 120), starting from the most recently asserted and extending in a time ordered sequence back in time. In accordance with example implementations, each value object 158 may contain a value (e.g., a primitive value or a reference to a data structure) for the MSV object 166 in the view 120 associated with the value chain 156 valid as of a particular timestamp, a timestamp representing that effective time and a link to a prior value object 158 (if any) for the same view 120.
[001 19] As also depicted in Fig. 1 C, the data structures 150 may also include view relative pointers 154. A view relative pointer 154 represents a reference, or pointer, to a structured object (a record, array or a map, as examples) and also specifies that read and write accesses to the structured object are to be seen through the perspective of an associated view 120. The view relative pointer 154 may contain a pointer to an object and a pointer to a view object 153. The value stored in a value object 158 may be a view relative pointer 154. In accordance with an example implementation, the view object 154 contained within a view relative pointer 154 stored in a value object 158 on a value chain 156 may be constrained to represent a view 120 associated with the same isolation context 1 10 that is associated with the value chain's 156 view 120. [00120] A given MSV object 166 represents alternative values for a property of a data structure 106, as seen through different views 120.The data structures 150 may also include MSV states 162. Each MSV state 162 represents the current set of one or multiple value chains 156, which may be associated with an associated MSV object 166. A given MSV object 166 may constrain the values associated with it to all be of a given type (e.g., all integers, all strings, or all lists of employee records). Alternatively, a given MSV object 166 may permit values of different types to be associated with it.
[00121 ] The data structures 150 may also include MSV states 162. Each MSV state 162 may represent a set of value chains 156 associated with an MSV object 166.
[00122] The data structures 150 may also include serializing tasks 170. In this manner, the tasks 170 may include write or publication tasks, used to effect either a modification of an MSV object 166 (for a write task) or an attempt to publish an isolation context object 151 (for a publication task). These tasks 170 may be used to ensure that operations that are supposed to be atomic are, in fact, atomic, as simultaneous execution might cause incorrect behavior. Rather than blocking, the serialization of the tasks 170 may allow threads to cooperate with each other, as further described herein.
[00123] In accordance with example implementations, the system 100 (and system 100) may use C++ atomic classes and associated functions, such as a C++ compare_exchange operation (also called a "compare and swap" operation or "CAS" operation). Moreover, in accordance with example implementations, versioned pointers may be used, which encapsulate a value along with a version number, which is incremented every time the value is modified. The version number may be represented by the use of sequences of bits (e.g., high-order bits) within the versioned pointer value. Pointers, including versioned pointers, may encapsulate flags representing Boolean values (e.g., represented as bits within the pointer value).
[00124] Operations performed on a data structure may be considered 150 to be logically atomic even though the performance of the operations takes measurable time, during which other operations involving the same data structure 150 may be initiated in another thread. In accordance with example implementations, such a logical atomic operation that is associated with a finite processing time may be deemed to be correct when the result returned by the operation is one that would have been correct had the operation been instantaneous and executed at some arbitrary time in between the time the operation started and the time the operation finished and, further, when any modifications the made in the course of performing the operation may be logically ordered as a group with respect to those of other logically atomic operations. It is noted that this standard of correctness may be sufficient to guarantee that no sequence of operations performed in other threads is able to determine that the operation was not performed atomically.
[00125] In accordance with example implementations, each modifiable location may be associated with an MSV object 166; each MSV object 166 may have an associated mapping from view objects 153 to value chains 156 (e.g., in an
associated MSV state 162); and each value chain 156 may have an associated timestamped list of value objects 158. The mapping may distinguish between open view objects 153 and closed view objects 153. In accordance with example implementations, the mapping may distinguish between open and closed value chains 156 or views 120 associated with such view objects 153. An open view object 153 is a view object 153 whose associated value chain 156 was last modified (e.g., by the addition of a value object 158) subsequent to the last time that the isolation context 1 10 associated with the view object 153 (representing a view 120) was published prior to the creation of the MSV state 162 (although the isolation context 1 10 may have been subsequently published). A closed view object 153 is a view object 153 whose associated isolation context 1 10 is known to have been published since the last time its associated value chain 156 was modified.
[00126] In accordance with example implementations, whenever a read or modify operation is attempted on an MSV object 166, open view objects 153 associated with the MSV object 166 whose isolation contexts 1 10 (as represented by isolation context objects 151 ) have been published may be closed before the read or modification occurs. More specifically, in accordance with example implementations, an open view object 153 may be closed by a close_published_views process. In this process, a new value object 158 may be added to the value chain 156 associated with view object 153 that is the parent of the view object being closed (e.g., as the head of the value chain 156). The value in the new value object 158 may be the value from the most recent (e.g., first) value object 158 of the value chain 156 associated with the open view object 153, and the timestamp of the new value object 158 may be a timestamp associated with the successful publication of the view object's 153 isolation context 1 10. If there are any view objects 153 to be closed by the close published views operations (e.g., if there are any view objects 153 whose associated isolation contexts 1 10 have been successfully published since the last time a value object 158 was added to the respective value chain 156), this is accomplished by creating a new MSV state 162 for the MSV 166 that shares value chains 156 with the old MSV state 162, and an attempt is made to atomically replace the old MSV state 162 in the MSV object 166 with the new MSV state 162. This attempt may fail due to another thread successfully replacing the MSV state 162 with a different new MSV state 162. If the attempt fails, the close published views operation may be retried until a new MSV state 162 is successfully installed or a determination is made that there are no open view objects 158 in the MSV object's 166 current MSV state 162. In this manner, the value chains 156 are shared, and the threads cooperate with each other, as further described herein. All subsequent operations are done relative to the resulting MSV state 162, even if the MSV's 166 state 162 is changed again before the operation finishes.
[00127] In accordance with example implementations, read and modify operations provided by MSV objects 166 may take as a parameter a view object 153 representing the view 120 to use in performing the operation. When the MSV object 166 implements a location referred to by a view relative reference 154, the view object 153 may be that associated with the view relative reference 154. In
accordance with an example implementation, the operations may additionally take as a parameter an isolation context object 151 representing a prevailing isolation context 1 10. In this implementation the isolation context object 151 may be requested to identify a shadow view object 153 for the provided view object 153, as described further below, and that shadow view object 153 may be used in place of the provided view object 153 in performing the operation. [00128] Read operations find the value chain 156, if any, associated with the view object 153 in the MSV state 162 and return the value from the value object 158 in the value chain 156 where the timestamp of the value object 158 is no later than a specified timestamp and is the latest such timestamp among the value objects 158. In accordance with example implementations, a "most recent" timestamp may be specified for the read operation, which indicates that the value object 158 with the most recent timestamp is to be used. If the timestamp is the "most recent" timestamp, the search for associated value chains 156 may be restricted to the open value chains 156 in the MSV state 162. If there is no associated value chain 156 or if there is no value object 158 on the value chain 156 prior to the timestamp, then the process is repeated, with the view object 158 being replaced by that view object's 158 parent view object (if any). If the isolation context 1 10 associated with the replaced view object 153 is a snapshot isolation context 1 10, the timestamp is replaced by the snapshot time for the isolation context 1 10 (e.g., the timestamp associated with the last successful publication of the isolation context 1 10 or the creation of the isolation context). This may continue until a value object 158 is found and its value is returned or until a view object 153 is determined to not have a parent. In this case, a default value of the appropriate type may be returned.
[00129] In accordance with example implementations, a lazy publication process may be used to effect propagating changes made to an MSV object 166 in a given view 120 due to the publishing of an isolation context 1 10 associated with the view. In this manner, in accordance with example implementations, operations that are directed to effecting the publication do not occur until a subsequent operation directed to the MSV object. More specifically, referring to Fig. 7, a technique 700 includes providing (block 704) an object that has a plurality of alternative values; associating (block 708) a plurality of views with the plurality of alternative values; and associating (block 708) a plurality of computational contexts with the views. The views are isolated (block 712) such that a request to determine a value in a given view results in a value of the plurality alternative values being returned, and a first value associated with a first view is independent from association of a second value a second view. The technique 700 includes publishing (block 716) a computational context of the plurality of computational contexts to allow a value of the plurality of alternative values associated with a view of the plurality of views associated with the published context to be read in at least one other of the views; and in response to an operation directed to the object after the publishing, processing an effect of the publishing on the at least one other views, pursuant to block 720.
[00130] Referring to Fig. 1A in conjunction with Fig. 1 C, as a more specific example, in accordance with example implementations, operations that modify values associated with views 120 an MSV object 166 may be serialized by means of write tasks associated with the MSV object 166. These write tasks are represented by corresponding write task objects 170 (also called "write tasks 170" herein). For example, such operations may include writing a value, incrementing or otherwise modifying a value, or resolving a conflict associated with a value. To effect this serialization, the MSV object 166 may have the capability to be associated with a single pending write task 170. A thread of the system 100 may create a write task object 170 to store the data required to perform the requested modification and to discover any conflicts that result from the modification. This data may be stored in the write task object 170, such that values determined by one thread performing the write task for the MSV object 166 may be visible to other threads performing the write task for the same MSV object 166.
[00131 ] The write task object 170 may also contain information about steps in the process that have been completed by some thread. This may allow a given thread performing the write task to skip steps that have already been completed by another thread. The thread performing the modification may attempt to change the pending write task associated with the MSV object 166 from a null value to the newly created write task object 170 (e.g., by means of a CAS operation). If this fails, it may indicate that another thread is processing a different write task directed to the MSV object 166 (i.e., may indicate that there is an ongoing modification of the MSV object 166 in association with another view object 153), and the present thread may perform that write task before repeating the attempt to install the newly created write task 170. In this way, multiple threads may cooperate in processing a modification and a thread in one operating system process may complete a modification begun by a thread in a different operating system process that died before the modification was completed. When an installation attempt is successful, the thread may process the newly installed write task.
[00132] A thread of the system 100 may process a write task directed to an MSV object 166 as follows. Within the description below the "current write" refers equivalently to the write task object 170 being invoked, the performance of the steps below, which may take place concurrently in multiple threads, and the intended modification. The write task may, in general, contain the following five steps:
1 . First, the write task obtains the current MSV state 162 associated with the MSV object 166. It is noted that this MSV state 162 may be different from the one obtained as a result of calling a process called "close_published_values()," which is described further below. Due to the serialization of tasks, no further changes may be made by other tasks until the current write task finishes.
2. The isolation contexts 1 10 that are associated with value chains 156 in the MSV state 162 are enumerated. For each isolation context 1 10 for which it is determined that it is possible for the current write to introduce a conflict or contingent conflict for the isolation context 1 10, the thread installs the current write task object 170 in the corresponding isolation context object 151 as a pre- publication task. This ensures that the isolation context 1 10 cannot be published until this write task 170 completes. Therefore publishing of the isolation context
1 10 does not occur until any conflicts 152 have been added to the isolation context object 151 . In this manner, if another thread (not involved with the current write task) attempts to publish one of these isolation contexts 1 10, this other thread first assists (via processing the write task 170) in deciding whether to add conflicts 152. A contingent conflict refers to an indication that should one isolation context 1 10 successfully publish its changes, doing so will, as part of the atomic publishing process, induce a conflict in another isolation context 1 10.
3. Next, the thread processing the write task determines the value to be associated as the result of the modification, and the write task may create a value object 158 and add the created value object 158 to the correct value chain 156.
4. The thread processing the write task next identifies any conflicts 152 or contingent conflicts for each value chain 156 in the MSV state 162 and adds these conflicts 152 and contingent conflicts to the associated isolation contexts 1 10. For example, the system 100 may determine whether the current write implies a conflict for the value chain's associated isolation context 1 10. For example, when the value chain 156 is open, the associated isolation context 1 10 is a publishable isolation context, and the associated view 120 is a descendent of the view being modified, the thread may determine that a conflict should be added to the isolation context object 151 associated with the value chain 156. As another example, when the isolation context 1 10 associated with the view 120 being modified is a publishable snapshot, the view 120 associated with the value chain 156 under consideration is an ancestor of the view being modified, and the timestamp of the most recent value object 158 on the value chain 156 under consideration is more recent than the snapshot time of the snapshot, the thread processing the write task may determine that a conflict should be added to the isolation context object 151 associated with view 120 being modified. As another example, when the value chain 156 is open and its associated view 120 is a sibling of the view 120 being modified, the thread processing the write task may determine that contingent conflicts should be added to one or both of the isolation contexts objects 151 associated with the views 120 such that when one of the corresponding isolation contexts 1 10 publishes the publication adds a conflict at the MSV object 166 to the other isolation context object 151 .
5. As a last step, the thread processing the write task may remove the current write task as a pre-publication task from the isolation context objects 151 that it was added to, thereby allowing any publication(s) to proceed. In an example implementation, this step may be performed as part of the prior step following the determination of any conflicts on all of the isolation context's 1 10 associated value chains 156 or following the addition of a conflict to the isolation context.
Table 2 [00133] Following the performance of the write task, the thread may attempt to remove the write task 170 as the pending write task 170 for the object, e.g., by using a CAS operation to replace the current write task 170 with a null value. The thread may make a single attempt, as failure of the CAS operation may indicate that another thread was previously successful in this removal.
[00134] For a frozen read, a thread may perform a check to determine whether there is already a value object 158 established on an open value chain 156 associated with the operation's view 120 since the last time the associated isolation context 1 10 was published (or at all if the isolation context 1 10 has yet to publish). If such a value object 158 exists, the value associated with the value object 158 may be returned. Otherwise, the thread may initiate a modify operation, where the value to be asserted by the modify operation may be the current value associated with the view, and this value may be returned by the frozen read operation. The modify operation may indicate in the value chain 156 that the value added is as the result of a frozen read and therefore, is not propagated to a value chain 156 associated with the view's 120 parent view 120 following successful publishing of the isolation context 1 10.
[00135] When an isolation context 1 10 is published, the publication process may be serialized with respect to other publication tasks or other write tasks installed as pre-publication tasks (as described above) by concurrently executing threads of the system 100: a thread may install a publication task in the isolation context object 151 associated with the isolation context 1 10 to be published after cooperating in finishing any currently installed publication task or write tasks. In accordance with example implementations, a conflict or contingent conflict may not be added to an isolation context object 151 by a write task 170 unless that write task 170 has been installed as a pre-publication task 170 on the isolation context object 151 ; and a write task 170 may not be installed as a pre-publication task 170 on an isolation context object 151 that has an installed publication task 170 before the publication task is finished with the cooperation of the thread attempting to install the write task 170. In this manner, after a publication task is installed, no further conflicts may be added until after the publication attempt associated with the publication task finishes. [00136] The system 100 installs any conflicts 152 for an associated isolation context 1 10 in the corresponding isolation context state 157. In this manner, if a given isolation context state 157 has actual conflicts 152 (i.e., non-contingent conflicts), then any attempted publication of the associated isolation context 1 10 fails, and the conflicts 152 are noted and may be addressed, as further described herein. Otherwise, if a given isolation context 1 10 attempts to publish and has no actual conflicts 152 installed, the isolation context 1 10 is allowed to publish; any contingent conflict(s) 152 are identified and added as actual conflict(s) 152 to the isolation context object(s) 151 associated with the contingent conflict(s) 152; and the isolation context object 151 being published is updated to be associated with a new isolation context state 157 having no (actual or contingent conflicts), an updated last publication time, and its prior isolation context state 157 as a prior state. In accordance with example implementations, the determination that there are no actual conflicts 152 and the updating of the isolation context state 157 may be constrained to constitute a single logically atomic operation.
[00137] In accordance with example implementations, the isolation context object 151 may include one or multiple of the following features. The isolation context object 151 may include an immutable reference to a parent isolation context object 151 .
[00138] The isolation context object 151 may include an atomically-updated reference to a current isolation context state 157. This reference may be atomically changed to refer to a new isolation context state 157 whenever the associated isolation context 1 10 publishes or when a conflict or contingent conflict is created for the context 1 10. As noted above, in accordance with example implementations, the isolation context state 157 may contain one or multiple conflicts 152. More specifically, in accordance with example implementations, the isolation context state 157 may include a list of conflicts representing actual conflicts; a list of conflicts representing contingent conflicts; a timestamp representing the last publication time; and a reference to the prior state of the associated isolation context 1 10 as of the last successful publication of the isolation context 1 10 (if any). When an isolation context object 151 is created, an associated isolation context state 157 may be created. The timestamp associated with a created isolation context state 157 may be the result of atomically implementing the event counter 164. In accordance with example implementations, the lists of conflicts and contingent conflicts may be linked lists such that creating a new isolation context state 157 based on an existing isolation context state 157 differing only in the addition of or removal of conflicts 152 on one of the conflict lists may involve isolation context states 157 whose lists share a state in a common suffix.
[00139] The timestamps of the isolation context states 157 associated with an isolation context object 151 (either directly or via prior state references) represent the creation time of the isolation context object 151 and its successful publications. These timestamps may be called "stable times" for the isolation context object 151 and its associated isolation context 1 10. The latest of these stable times (e.g., the timestamp of the isolation context state 157 directly associated with the isolation context object 151 ) may be called the "last stable time" for the isolation context object 151 .
[00140] The isolation context object 151 may have an atomically-updated reference to a pre-publication task that may be a write task or a publication task that is to be completed before publication of the associated isolation context 1 10 may be attempted. In accordance with example implementations, the pre-publication task may be a single publication task or a collection of write tasks, all of which are to be completed before publication of the associated isolation context 1 10 may be attempted. The completion of the write tasks from the collection of write tasks may include basing whether to perform the write task on an indication of whether the task has been completed (e.g., by another thread).
[00141 ] The isolation context object 151 may have an associated map, also called a "shadow map," which maps view objects 153 to other view objects 153. In accordance with example implementations, the shadow map may be a lock-free map. Moreover, in accordance with example implementations, the shadow map may be a lock-free cuckoo map. [00142] The isolation context object 151 may have an associated immutable enumerated value that represents whether the associated isolation context 1 10 is a live or snapshot isolation context 1 10.
[00143] The isolation context object 151 may have an immutable, enumerated value that represents the associated isolation context's modification type, which may be a publishable isolation context 1 10, a detached isolation context 1 10 or a readonly isolation context 1 10. A publishable isolation context 1 10 refers to an isolation context 1 10 that may be published. A detached isolation context 1 10 refers to an isolation context 1 10 that is constrained to be one that is not published, but values in the detached isolation context 1 10 may be modified. A read-only isolation context 1 10 refers to an isolation context 1 10 that is constrained to not be published, and values in the read-only context 1 10 are constrained to not be modified. The system 100 may be constrained to not create a publishable isolation context 1 10 whose parent is a read-only isolation context 1 10, as publishing modifications from the child isolation context 1 10 would constitute an impermissible modification to the read-only parent isolation context 1 10.
[00144] In accordance with example implementations, the data structures 150 include a global isolation context object 156 representing the global isolation context 1 10 and having no parent isolation context object 156.
[00145] The isolation context object 151 may also be associated with one or multiple of the following operations.
[00146] The isolation context object 151 may provide a shadow(view object) operation to identify a view object 153 associated with the isolation context object 151 and related to the specified view object 153. If the shadow(view object) operation determines that the isolation context object 151 is associated with the view object 153, the view object 153 is returned as the value of the shadowQ operation. Otherwise, in accordance with example implementations, the shadow(view object) operation involves checking the shadow map associated with the isolation context object 151 . If there is no entry in the shadow map corresponding to the view object 153, the shadow(view object) operation may include invoking the shadow(view object) operation on the parent isolation context object 151 . If the shadow map contains a view object 153 corresponding to the resulting view object 153, the corresponding view object may be returned. Otherwise, the shadow(view object) operation may create a new view object 153, as a child of the parent's shadow view object 153. The shadow(view object) operation may associate the new view object 153 with the specified view object 153 and with the parent's shadow view object 153. In this way, a hierarchy of shadow view objects 153 may be created and efficiently retrieved.
[00147] The isolation context object 151 may provide a new_child(view type, modification type, timestamp) operation to create a child isolation context object 151 of the isolation context object 151 , where the timestamp associated with the child's isolation context state 157 is the specified timestamp or, if the timestamp is omitted or "most recent" is specified, the result of incrementing the event counter 164, and where the child has the specified view type (e.g., either "live" or "snapshot") and the specified modification type (e.g., "publishable", "detached", or "read-only").
[00148] The isolation context object 151 may provide a publishQ operation to install and run a publish task, as described below, after first collaborating in finishing any publish or write tasks already installed in the context's pre-publication task.
[00149] The isolation context object 151 may provide an add_conflict(conflict) operation to first check whether the conflict 152 has been marked as being
"resolved". If not, the operation atomically replaces the current isolation context state 157 with a new isolation context state 157 identical to the current save that the conflict 152 is prepended to the conflict list. The replacement may be performed atomically by a CAS operation, looping until an attempt succeeds and creating a new isolation context state 157 each time. The operation then attempts to install the conflict 152 in a location (e.g., a value chain 156) associated with the conflict 152 to assert that there is known to be a conflict 152 at that location. The attempt to install may be made by a single invocation of a CAS operation attempting to replace a null value with the present conflict 152. Failure in this attempt implies that another conflict 152 has been installed there (e.g., by another thread), and accordingly, the present conflict 152 is marked as being resolved. [00150] The isolation context object 151 may provide an
add_contingent_conflict(conflict) operation which atomically replaces the current isolation context state 157 with a new isolation context state 157 identical to the current save that the conflict 152 is prepended to the contingent conflict list. The replacement may be performed atomically by a CAS operation, looping until an attempt succeeds and creating a new isolation context state 157 each time.
[00151 ] The isolation context object 151 may provide a
conflict_resolved(conflict) operation to remove a conflict 152 from preventing publication of the isolation context 1 10 associated with the isolation context object 152. If the specified conflict 152 is the head of the current isolation context state's 157 conflict list, the operation replaces the associated isolation context state 157 with a new isolation context state 157 equivalent to the old one, save that all initial conflicts 152 in the conflict list that are marked as resolved are removed. In this way, in accordance with example implementations, resolved conflicts 152 may remain on the conflict list, but after all conflicts 152 have been marked as being resolved, the conflict list in the isolation context state 157 is empty.
[00152] The isolation context object 151 may provide a
was_published_after(timestamp) operation to return a true value if the isolation context object 151 was published after the timestamp and to return a false value otherwise, where the determination may be made based on the timestamp
associated with the current isolation context state 157.
[00153] The isolation context object 151 may provide a
publication_after(timestamp) operation. The operation traverses the list of the isolation context states 157, starting from the current isolation context state 157 and continuing by following the prior state pointer; and returns the earliest timestamp associated with a state 157, such that the timestamp of the state 157 is after the specified one. This operation may also indicate whether there is no such timestamp.
[00154] The isolation context object 151 may provide a
publication_before(timestamp) operation. The operation traverses the list of the isolation context states 157, which may be similar to the publication_after(timestamp) operation. However, the publication_before(timestamp) operation returns the latest publication timestamp before the given one, or a zero timestamp (or a similar timestamp that is considered to be before any other timestamp), if the given timestamp is before the associated isolation context's creation time.
[00155] In accordance with example implementations, during processing of a successful publication of an isolation context 1 10, when contingent conflicts 152 are being processed, multiple thread may cooperating with efforts to effect the publication. Due to this cooperation, it is possible that multiple threads may attempt to add the same conflict 152 to the associated isolation context object 151 . For purposes of preventing multiple threads from adding the same conflict 152, in accordance with example implementations, the contingent conflict list of the isolation context state 157 may have nodes, where each node indicates a contingent conflict and has an associated "handled" flag. The contingent conflict associated with a node may only be added as a conflict to its associated isolation context object 151 following a determination that the flag is not set; and the flag may be atomically set following the successful addition of the conflict 152.
[00156] In accordance with example implementations, the view object 153 may have one or multiple of the following properties.
[00157] The view object 153 may contain an unchangeable, or immutable, reference to the isolation context 1 10 (as represented by object 151 ) that created it; and the view object 153 may contain an immutable reference to its parent view object 153 (which may or may not be associated with the same isolation context 1 10).
[00158] The view object 153 may have an ancestor cache, which may be a map from view objects 153 to Boolean values, where an association in the map indicates a determination that a given other view object 153 has been determined to be or to not be an ancestor of the current view object 153. In this manner, in accordance with example implementations, a true Boolean value, a false Boolean value and no entry, represent whether the ancestry has been determined to be an ancestor, has been determined to not be an ancestor or is to be determined, respectively. In accordance with some implementations, the ancestor cache is not constructed until a determination is required and is atomically installed. In accordance with some implementations, the ancestor cache is a lock free map (e.g., a lock free cuckoo map).
[00159] In accordance with example implementations, the data structures 150 include a top-level view object 153, which is associated with the global isolation context object 151 and which has no parent view object 153.
[00160] The view object 153, in accordance with example implementations, may provide an operation that is directed to retrieving a reference to the associated isolation context object 151 .
[00161 ] The view object 153 may provide a has_ancestor(view object) operation to determine whether a given view object 153 is an ancestor of the view object 153. The operation includes first determining whether the specified view object 153 is the same as the view object 153, the same as the parent of the view object 153, or the top-level view object 153. In any of these cases, the operation may return a true Boolean value. Otherwise, if the view object 153 is the top-level view object 153, then the operation may return a false Boolean value. In accordance with example implementations, if the view object 153 does not contain an ancestor cache, the operation creates one and atomically associates it with the view object 153. The ancestor cache may be examined to determine whether the answer is known. If a Boolean value is found to be associated in the ancestor cache with the specified view object 153, the Boolean value may be returned. Otherwise, the ancestry of the view object 153 may be walked, or traversed, starting with its parent view object 153 and continuing through successive parent view objects 153 until the specified view object 153 or the top-level view object 153 is encountered. The value of the operation depends on the view object 153 encountered during the traversal of the ancestry: a Boolean true value if the specified view object 153 is encountered and a Boolean false value if the top-level view object 153 is encountered. The value of the operation may be associated with the specified view object 153 in the ancestor cache and returned.
[00162] In accordance with example implementations, the conflict 152 may have one or multiple of the following properties. A conflict 152 may contain an atomically-updated flag to indicate, or represent, whether or not the conflict 152 has been resolved. A conflict 152 may contain an immutable reference to an atomically- updated reference to a conflict 152, which is the location in which this conflict is installed.
[00163] The conflict 152 may have one or multiple subclasses holding information identifying different types of locations at which conflicts may occur. In accordance with example implementations such subclasses may include: 1 .) a field conflict, which holds a reference to a record object and an indication of a field within that record object; 2.) a bound name conflict, which holds a reference to a
namespace object and an indication of a name within that namespace object; and 3.) an array conflict, which holds a reference to an array object and an index identifying a position within the array object.
[00164] The MSV object 166 may have one or multiple of the following properties.
[00165] An MSV object 166 may contain an atomically-updated reference to its current state 162.
[00166] An MSV object 166 may contain an atomically-updated reference to a pending write task, which is a write task is in progress and must be completed before another modification can be performed.
[00167] An MSV object 166 may contain an immutable reference to a conflict generator 160. The type of the conflict generator 160 referred to may depend on the type of location the MSV object 166 represents (e.g., field of record or slot of array). Conflict generators 160 associated with different types of location may generate different instances of different subtypes of conflict 152.
[00168] In accordance with example implementations, the MSV object 166 may be an instance of a template (or parameterized or generic) class, where the template parameter may indicate the type of data contained (e.g., the values contained in value objects 158 and provided to and returned by operations). This template parameter may also be used by the MSV state 162, value objects 158, and value chains 156, which may be nested within the class implementing the MSV object 166. Such an arrangement may allow different representations of value objects 158 holding different types of values, which may permit more efficient representation when values are of a primitive type (e.g., numbers, Boolean values, or characters).
[00169] The MSV object 166 may provide one or multiple of the following operations.
[00170] The MSV object 166 may provide a read(view, timestamp) operation to determine and return the value of the MSV object 166 for a specified view object 153 as of (i.e., no later than) the given timestamp, which, if omitted defaults to the "most recent" timestamp, indicating that the most recent value should be returned. In accordance with example implementations, a process of reading from an MSV object 166 includes invoking the current_value operation on the MSV state 162 associated with the MSV object 166.
[00171 ] The MSV object 166 may provide a read_frozen(view) operation to determine and return the current (e.g., most recent) value of the MSV object 166 for the given view object 153. The read_frozen(view operation) ensures that
subsequent reads (frozen or otherwise, including reads involved in atomic
modifications) in the same view 120 retrieve the same value until the MSV object 166 is modified in the view 120 or until the isolation context object 151 that is associated with the view object 153 is successfully published.
[00172] The MSV object 166 may provide a has_value(view, timestamp) operation to indicate (e.g., via returning a Boolean value) if a read operation with the same parameters would return a value discovered in a value object 158.
[00173] The MSV object 166 may provide a modify(view, op, resolve?, argument) operation to modify the value in the given view object 153, according to the given operation applied to the current value object 158 and (when applicable) the given argument. In this notation, the "?" suffix denotes a Boolean value. The operations may include one or multiple of the following: an operation to set the value to the argument; an operation to add, subtract, multiply, or divide the current value by the argument; an operation to clear the value (e.g., set the value to a default value); and an operation set the value to a value present in the MSV object 166, where the present value may be one of the current value in the given view 120; the value in the parent view 120 of the given view object 153, the last stable value (e.g., the value as of the last stable time for the associated isolation context 1 10); and the current value, with an indication that this should be noted as implementing a frozen read. In accordance with example implementations, the "resolve?" argument controls whether this modification should be considered to resolve any conflict 152 for this view object 153 in this MSV object 166.
[00174] The MSV object 166 may provide a write(view, resolving, new value) operation, which is an alias for the above-described operation with the operation that of setting the value to the given argument.
[00175] In accordance with example implementations, operations that are provided by the MSV object 166, such as one or multiple of the above-described operations, may be preceded by closing the published view objects 153 in a close_published_views() process, which is described below.
[00176] The value chain 156 may have one or multiple of the following properties. The value chain 156 may represent the history of value objects 158, which are associated with a view object 153.
[00177] The value chain 156 may contain immutable references to the value chain's view object 153. In accordance with example implementations, the
references may be a pointer that contains flags that record whether the view object's isolation context 1 10 is mutable and/or a snapshot.
[00178] The value chain 156 may contain a pointer to the latest, or most recent, value object 158, and in accordance with example implementations, the pointer may contain a flag that indicates or represents whether the most recent value object 158 was due to a frozen read or equivalent operation (e.g., an operation that asserts a value equivalent to the current value, such as an operation of adding or subtracting zero or an operation of multiplying or dividing by one).
[00179] In accordance with some implementations, view-relative pointers may be installed on the value chain 156. In this manner, the view-relative pointers may be associated as values in the value objects 158 that are installed on the value chain 156. Moreover, in accordance with example implementations, the view-relative pointers may be constrained to be relative to some view object 153 that shares an isolation context 1 10 with the view object 153 of the value chain 156.
[00180] The value chain 156 may contain an atomically-updated reference to a conflict 152 that is associated with the value chain 156. This may be a null reference to indicate a lack of a conflict.
[00181 ] In accordance with example implementation, the value chain 156 may provide one or multiple of the following operations.
[00182] The value chain 156 may provide a value(timestamp) operation to return a reference to the value object 158 representing the latest value object 158 in the value chain 156 before the given timestamp, or provide a null reference, if no such value object 158 exists. The operation may involve traversing the value chain 156 by starting at the latest value object 158 associated with the 158 and following each value object's link to the prior value objects 158.
[00183] The value chain 156 may provide an add_value(value) operation to update the most recent value object 158 reference to a new value object 158 with the provided value. The timestamp of the new value object 158 may be the current value of the event counter 164, and the prior pointer of the new value object 156 may refer to the old most recent value object 158. The add_value(value) operation may return the newly added value object 158. In accordance with an example implementation in which the event counter is incremented upon the creation of a snapshot isolation context 1 10, the prior pointer for the new value object 158 may be set to the prior pointer of the old most recent value object 158, rather than the prior pointer of the most recent value object 158, for the case in which the timestamp of the old most recent value object 158 is the current value of the event counter 164. This allows the old most recent value object 158 to be garbage collected (assuming there are no other references to it). That is, in accordance with example implementations, when the event counter 164 has not changed in between successive additions to a value chain 156, the new value object 158 replaces the old most recent value object 158, rather than extending the chain of value objects 158. It is noted that this is permissible because if the event counter 164 has not been incremented, this means that no snapshot has been created since the last time a value was added to the value chain 156. Therefore, with no snapshot being created after the last time a value was added to the value chain 156, a future read of the MSV object 166 would not have otherwise correctly returned the old most recent value object 158.
[00184] The following scenario may occur. In between reading the current event counter 164 and asserting the value, it is possible that a snapshot isolation context 1 10 was created, incrementing the event counter 164; and a read was performed in this snapshot (or a descendent), resulting in a request for the value at a time preceding this later counter, returning the old most recent value object 158. If the add_value() call is allowed to patch around the old most recent value object 158, the result would be that a second read in this snapshot would result in a different value, violating the rules of what it means to be a snapshot.
[00185] For purposes of preventing the above-described scenario from occur, the following measures may be employed, in accordance with example implementations. The most recent value object pointer contains a read-in-snapshot flag, indicating whether the value from the most recent value object 158 in the value chain 156 was read after following the parent link of a view object 153 associated with a snapshot isolation context 1 10 (e.g., when the timestamp of the read is a stable time for a snapshot rather than the "most recent" timestamp). When finding a value (e.g., in the value(timestamp) operation), when the timestamp is other than the "most recent" timestamp and the read-in-snapshot flag is not already true, before walking the chain of value objects 158, the reference to the most recent value object 158 (including its read-in-snapshot flag) is remembered. If the value object 158 found is the most recent value object 158, a single attempt is made to replace the remembered value object reference with an identical one that has the read-in- snapshot flag set to true. If this attempt fails, it means that either some other thread succeeded (i.e., a failure occurred due to the assumption that the false value for the read-in-snapshot flag was wrong) or another value object 158 was added to the value chain 156 (i.e., a failure occurred because the most recent value object 158 reference had changed). Either cause of failure removes the problem. When adding a value object 158, the old most recent value object reference may be read before the current timestamp is read. As, in accordance with an example implementation, the current timestamp is incremented upon the creation of a snapshot, it may be inferred that either the read-in-snapshot flag of the read most recent value object reference is false or the current timestamp was incremented since the last value object 158 (e.g., the current most recent value object 158) was added to the value chain 156. When the update occurs, the new value for the most recent value object reference has the read-in-snapshot flag set to false. If it is the case that the read-in- snapshot flag was set in between the time the read of the most recent value object reference was made and the time of the attempted update, then the CAS operation to change the most recent value object reference from the read value to the new value fails. In this case, another attempt is made to add the value, which involves reading a new current timestamp and updating the timestamp and next pointer on the value object 158 being added.
[00186] The MSV state 162 indicates states for value chains 156 of an associated MSV. In this manner, in accordance with some implementations, the MSV state 162 contains an array of value chains 156 (by reference). Each value chain 156 may be considered "publishable" or "unpublishable" based on its associated isolation context. Each value chain 156 may further be considered to be "open" or "closed". An open value chain 156 is one on which a value object 158 was added subsequent to the last stable time of the associated isolation context 1 10 as of the time the MSV state 162 was created. A closed value chain 156 is one that is not an open value chain 156. In accordance with example implementations, a closed value chain 156 is publishable, and an unpublishable value chain 156 is open. The value chains 156 may be arranged in the array such that all open value chains 156 preceded all closed value chains 156 and all unpublishable value chains 156 precede all publishable value chains.
[00187] In accordance with some implementations, the first entry in the array may be reserved for a value chain 156 that is associated with the top-level view object 153. This value chain 156 is unpublishable (since the top-level view object 153, which lacks a parent is unpublishable) and as such, may be considered open even if it has no value objects 158.
[00188] The MSV state 162 may contain an indication of the number of contained unpublishable value chains 156, the number of open publishable value chains 156, and the number of closed value chains 156. These numbers allow the determination of the state of a value chain 156 based on its position within the array.
[00189] In accordance with example implementations, the MSV state 162 may provide one or multiple of the following operations.
[00190] The MSV state 162 may provide the close_published_views() operation, which is further described below.
[00191 ] The MSV state 162 may provide a values(view, open only?) operation to return the value chain 156 that is associated with the specified view object 153, if it exists. If the "open only?" parameter is a Boolean true value, then a value chain 156 is not returned unless the value chain 156 is found among the open (e.g., unpublishable and open publishable) value chains 156.
[00192] With continued reference to Fig. 1 C, the MSV state 162 may provide a current_value(view, timestamp) operation to return the value associated with the view object 153 as of the specified timestamp. In accordance with example implementations, this operation may traverse parent view objects 153 and the timestamp may be modified when traversing parent view objects 153 of view objects 153 associated with snapshot isolation contexts 1 10. The operation may begin at the specified view object 153 and may call the values(view, open only?) operation to get the value chain 156 associated with the view object 153. The "open only?" parameter to the operation may be the Boolean true value just in case the specified timestamp is the "most recent" timestamp. If a value chain 156 was found, the current_value operation may call value(timestamp) on this result to get a value object 158. If none is found, the operation may replace the current view object 153 by its parent view object 153. If the current view object 153 is associated with a snapshot isolation context 1 10, the operation may replace the timestamp with the last stable time of the isolation context 1 10 prior to the timestamp by calling the publication_before(timestamp) operation on the isolation context object 151 .
Following these replacements, the search may be performed again, and this process may continue until a value object 158 is found or the parentage is exhausted.
[00193] If a value object 158 is found, it is returned, otherwise a null pointer is returned. During this process, after failing to find a value chain 156 associated with a parent view object 153 that is associated with an isolation context 1 10 that is not read only, when the timestamp is the "most recent timestamp", it is possible that a value object 158 is found on another value chain 156. In this circumstance, there is a possibility that by the time the operation has finished, the MSV state 162 has been replaced in the MSV object 166 with a new one that has a value chain 156 for that view object 153; and the modification that added the value chain 158 took place before the value object 158 found was added. Therefore, that value chain 156 contains the value object 158 that should have been returned by the operation. In this situation, the current_value(view, timestamp) operation returns (as a secondary return value) an indication that the value 153 returned should not be trusted if the MSV's state has changed. When this secondary return indication is seen, the caller attempts to confirm that the MSV state 162 is still associated with the MSV object 166. If this is not the case, the caller may call current_value() on the new state. This process may repeat until the secondary return indication is not seen or the MSV object's 166 current state has not changed.
[00194] As discussed above, an MSV object 166 may provide a
close_published_views() operation to update the MSV object 166 to be associated with an MSV state 162 that reflects all publications of isolation contexts 1 10 that have happened since the last time close_published_views() was called. The close_published_views() operation may accept as a parameter a view object 153 (called the "write view" below) indicative of a view 120, if any, that the caller intends to assert a value in and which, therefore is associated with an open value chain 156 in the resulting MSV state 162. The operation may return the following values: 1 . the resulting MSV state 162, which if not null, has been installed in the MSV object 166; 2. the value chain 156 associated with the write view (if specified) in the resulting MSV state 162; and 3. an indication of whether the returned value chain 156 (if any) was an open value chain 156 prior to the operation.
[00195] In accordance with example implementations, a thread executing in the system 100 may perform the following actions when executing the
closed_publ ished_views() operation :
1 . The thread reads the current state of the MSV state 162 (called the
"remembered state" below).
2. If the invocation of the close_published_views() operation is not associated with processing a write task 170 associated with the MSV object 166, then the thread helps with any pending write tasks associated with the MSV object 166, as described above.
3. If there was no remembered state (e.g., if the current state of the MSV object 166 was a null reference), then if there is no write view object 153, the thread returns a null MSV state 162 reference. Otherwise, the thread creates a new value chain 156 associated with the write view. If the write view is not the top- level view object 153, the thread may also create a value chain 156 associated with the top-level view object 153. The thread then creates a new MSV state associated with an array containing the created value chains 156 and recording the number of unpublishable and open publishable value chains 156 this represents. The thread returns the created MSV state 162, the value chain 156 associated with the write view, and an indication that the returned value chain 156 was not open.
4. Otherwise, the thread calls a find_need_close() operation on the MSV state 162, which returns a priority queue indicating open publishable value chains 156 whose associated with isolation context objects 151 that have been published more recently than the timestamp of the most recent value object 158 on the value chain 156. The priority queue may be ordered from least-recently published to most-recently published, and the elements may include the publication time following the most recent value object's 158 timestamp and an index in the value chain 156 array of the MSV state 162. It is noted that the timestannp stored in the priority queue may not be reflect the most recent publication of the isolation context object 151 . In searching for value chains 156 to add to the priority queue, if the thread discovers a value chain 156 associated with the write view 153, it may remember it for later use.
5. If the returned priority queue is empty, then no work is needed to process the effects of publishing any isolation context 1 10. The thread may proceed as follows:
a. If no write view object 153 was specified or if a value chain 156 associated with the write view object 153 was previously discovered and remembered, the thread may return from the operation: the current MSV state 162, the write value chain 156 (if any), and, if a write value chain 156 was found, an indication that the write value chain 156 was previously open.
b. If a write view object 153 was specified and no associated value chain 156 was discovered, it may be inferred that no write value chain exists among the open publishable value chains 156. If the write view object 153 is associated with an unpublishable isolation context 1 10, a search may be made through the unpublishable value chains 156 associated with the MSV state 162. As an optimization, if the write view object 153 is the top-level view object 153, an associated value chain 156 may be found as the first value chain 156 among the unpublishable value chains 156. If an associated value chain 156 is found, it may be returned, along with the current MSV state and an indication that the value chain 156 was previously open. If no unpublishable value chain 156, a new MSV state 162 may be made that is a copy of the remembered state with the addition of a new unpublishable value chain 156 associated with the write view 153. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the new value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
c. If the write view object 153 is associated with a publishable isolation context 1 10, a search may be made through the closed value chains 156 associated with the MSV state 162. If an associated value chain 156 is found, a new MSV state 162 may be made that is a copy of the remembered state, save that the found value chain 156 is moved to become an open publishable value chain 156. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the found value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
d. Otherwise, a new MSV state 162 may be made that is a copy of the remembered state with the addition of a new open publishable value chain 156 associated with the write view 153. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the new value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
6. If the priority queue is not empty, an operation called process_close(), described below, may be called on the current MSV state 162, passing in the priority queue and the write view object 153. The operation returns the same three values, which are remembered.
7. An attempt is then made to install the new MSV state 162, replacing the remembered MSV state 162. If the installation of the new MSV state 162 succeeds, then the remembered three values are returned. If the installation of the new MSV state 162 is unsuccessful, then it may be inferred that another thread changed the state 162 while the current thread was working on the state 162. Therefore, the close_published_views() operation retried results of that invocation are returned.
Table 3
[00196] The process_close() operation, referenced above, has access to the priority queue and the write view object 153 (if any) of its caller, and in accordance with example implementations, maintains the following structures: a vector of open publishable value chains 156; a vector of closed value chains 156; and a vector of new unpublishable value chains 156. It is noted that the designation of these value chains 156 as, e.g., open publishable, represent the point of view of an MSV state 162 being designed and the designation for a given value chain 156 may change during the operation.
[00197] The vector of open publishable value chains 156 may initially contain the open publishable value chains 156 in the MSV state 162. Whenever a new open publishable value chain 156 is created or a closed value chain 156 is reopened, the value chain 156 is added to the end of this vector. Whenever a value chain 156 is closed, the slot in the vector is replaced by a null pointer. A side count may be maintained, representing the number of non-null entries in the vector.
[00198] The vector of closed value chains 156 may initially contain the closed value chains 156 in the MSV state 162. In accordance with an implementation, the creation of this vector may be deferred until there is a need to alter its contents.
Whenever an open publishable value chain 156 is closed, the value chain 156 is added to the end of the vector. Whenever a closed value chain 156 is reopened, its slot in the vector is replaced by a null pointer. A side count may be maintained, representing the number of non-null entries in the vector.
[00199] The vector of new unpublishable value chains 156 may initially be empty. When a new unpublishable value chain 156 is created, the value chain 156 is added to the end of the vector. The unpublishable value chains 156 of the MSV state 162 being designed comprise the unpublishable value chains 156 of the current MSV state 162 and the contents of this vector.
[00200] In accordance with example implementations, the process_close() operation may repeatedly remove items from the priority queue and identify value chains 156 and publication timestamps, where the values removed reflect the earliest such timestamps in the priority queue, stopping when the priority queue is empty. Each such value chain 156 (which may be called a child value chain) is in the open publishable value chain 156 vector. The operation next identifies the parent value chain 156 associated with the parent view object 153 of the view object 153 associated with the child value chain 156. If the parent value chain 156 is identified in the open publishable value chain 156 vector, the unpublishable value chain 156 vector, or the unpublishable value chains 156 of the current MSV state 162, it is noted. Otherwise, if the parent view object 153 is associated with an unpublishable isolation context 1 10, a new parent value chain 156 is created and add it to the unpublishable value chain 156 vector. If the parent view object 153 is associated with a publishable isolation context 1 10, a search is made through the closed value chain 156 vector (or, if this has not yet been created, the closed value chains 156 associated with the current MSV state 162). If a parent value chain 156 is found, the parent value chain 156 is reopened by removing it from the closed value chain 156 vector and adding it to the open publishable value chain 156 vector. If a parent value chain 156 is not found, a new parent value chain 156 is created and added to the open publishable value chain 156 vector.
[00201 ] The child value chain 156 may then be closed, and modifications (if any) of the child value chain 156 are transferred to its parent. The child value chain 156 may be removed from the open publishable value chain 156 vector. If there is an indication in the child value chain 156 that the most recent value object 158 was added as the result of a frozen read operation, there is no value that needs to be transferred to the parent value chain. In addition, the most recent value object 158 associated with the child value chain 158 may be removed from the child value chain 156 along with the indication. If this results in a child value chain 156 with no associated most recent value object 158, the child value chain 156 may be discarded.
[00202] If there is no indication that the most recent child value 158 was the result of a frozen read, the most recent parent value object 158 (if any) is discovered on the parent value chain 158. If the most recent parent value object 158 exists and the associated timestamp is greater than or equal to one less than the timestamp retrieved from the priority queue, this may indicate that another thread has already processed the publication of the child view object 153, so this thread need not.
Otherwise a new parent value object 158 is created based on the most recent child value object 158, having a timestamp equal to the retrieved timestamp minus one and the same value unless the value is a view-relative pointer 154, in which case the value is a copy where the view object 153 associated with the copy is the parent of the view object 153 associated with the original. The prior value object 158 of the new parent value object 158 is the old most recent parent value object 158, if any. The operation then makes a single attempt to atomically replace the old most recent parent value object 158 with the new parent value object 158 in the parent value chain 156. A failure in this attempt may indicated that another thread has already processed the publication of the child view object 153.
[00203] If the parent view object 153 is associated with a publishable isolation context 1 10, a determination is made as to whether the parent isolation context 1 10 was published following the retrieved timestamp. If it was, a new entry is added to the priority queue reflecting the parent value chain 156 and the earliest publication time of the parent isolation context 1 10 following the retrieved timestamp. It is noted that this step is performed even if an earlier determination was made that another thread installed a new parent value object 158 reflecting the current child value object 158.
[00204] If the child value chain 156 was not discarded, it is now added to the closed value chain 156 vector.
[00205] When the priority queue is empty, the thread may proceed to identify the write value chain 156 associated with the write view object 153, if any. This may be performed by the same procedure as was used to identify each parent value chain 156.
[00206] Next, the thread may create a new MSV state 162 associated with the unpublishable value chains 156 from the current MSV state 162, the unpublishable value chains 156 from the unpublishable value chain 156 vector, the open publishable value chains 156 from the open publishable value chain 156 vector, and the closed value chains 156 from the close value chain 156 vector (or if this last has not been created, from the current MSV state 162). The thread may then return this new MSV state 162, the write value chain 156 (if any) and an indication of whether the write value chain 156 was found among the open value chains 156. [00207] Cooperative tasks are described herein for purpose serializing operations, such as publication and adding conflicts to an isolation context or modifying a value. Each point of serialization may be associated with a "task holder." The task holder refers to a task and may be atomically updated.
[00208] To perform the task, the task is first installed in the holder. This installation may involve attempting to replace a null pointer in the task holder with a reference to the task by means of the CAS operation. If the task holder was not empty, then this attempt fails, and the current value is obtained. If the blocking task is not the one being installed (i.e., if it was not the case that another thread was successful in installing it), then the task is run in the current thread, and an attempt is made to replace the task with null. If this fails, it means that another thread removed it first, so another iteration occurs to try the install again. When the task is successfully installed, the task is then run and then an attempt is made to remove it (replace it with null).
[00209] In accordance with example implementations, the work of the task may occur in the task's run() operation, which may be defined in a subclass.
[00210] Publish tasks may be installed in the isolation context objects 151 to (attempt to) perform a publication of the corresponding isolation context 1 10. The publish tasks are in competition for this task holder with other publish tasks and with write tasks that may wish to add conflicts to this context.
[0021 1 ] The data within a publish task is accessible to all threads attempting to complete it.
[00212] In accordance with example implementations, the publish task may have one or multiple of the following properties. The publish task may have an immutable reference to the isolation context object 151 being published. The publish task may have an atomically-updated timestamp representing the publish time (initially zero). The publish task may have an atomically-updated reference to a list of conflicts seen by the publish task. This atomically-updated reference may contain a "has value" flag, initially false, to be able to distinguish between the scenario in which it is unknown whether there are any conflicts 152 and the scenario in which it has been deternnined that there are no conflicts 152.
[00213] In accordance with example implementations, to perform a publish task via the publishQ call for an isolation context object 151 , the following operations may be performed by a thread of the system 100:
1 . If the publish time is zero, this indicates that the publish time has yet to be established. The current timestamp counter 164 is incremented, and a single attempt is made to change the publish time from zero to the resulting value. If this attempt fails, it means that another thread has already established the publish time.
2. If the has-value flag on the conflicts reference is set, the publish task has finished, and the conflicts reference either contains a list of blocking conflicts or is null, indicating that the publish succeeded as of the publish time. The publish task is complete.
3. Otherwise, the isolation context object's 151 associated isolation context state 157 is requested to attempt to publish as of the determined publish time.
This may request may return a list of conflicts 152, which may be an empty list, indicating successful publication. This list may be installed in the task via the conflicts reference with the has-value flag set. The publish task is complete.
Table 4
[00214] In accordance with example implementations, an isolation context state 157 associated with an isolation context object 151 may attempt to publish as of a specified publication timestamp as follows:
1 . If there are conflicts 152 noted in the current isolation context state 157, they are returned.
2. If this is not the first time this step is processed and the publication time associated with the current isolation context state 157 is later than the specified publication timestamp, this may indicate that the requested publication has been completed in another thread. An empty list of conflicts 152 may be returned.
3. If there are contingent conflicts 152 associated with the current isolation context state 157, then the corresponding contingent conflict list is traversed, adding each contingent conflict 152 to its associated isolation context object 151 . As described above, the nodes in the list may contain a "handled?" flag, which is checked before adding and set afterwards, to minimize duplicate work as threads run through the list at the same time. In accordance with further example implementations, a count of the contingent conflicts 152 handled is maintained, and a handling flag is used in addition to the "handled?" flag. In accordance with this further example implementation, in a first pass, contingent conflicts 152 having states that may be changed from unhandled to handling, are added, the state then being changed unconditionally to handled, and a counter may then incremented. After the first pass, if the count did not equal the number of items of the entire list, this means that some thread claimed an entry and then terminated or paused; and when this occurs, a second pass may be performed, adding all contingent conflicts who are in handling and incrementing the counter for each of these conflicts if its state may be changed from handling to handled.
4. If the current isolation context state 157 is no longer the one associated with the isolation context object 151 , the preceding steps are repeated with the current isolation context state 157 being that now associated with the isolation context object 151 .
5. It may now be determined that there are no conflicts 152 preventing the publication. A new isolation context state 157 is constructed with the given publication timestamp and whose prior state is the current isolation context state 157.
6. An attempt may then be made to replace the current isolation context state 157 with the new one in the isolation context object 151 . If this attempt succeeds, an empty conflict list is returned.
7. Otherwise, the failed attempt identifies the isolation context state 157 associated with the isolation context object 151 , and the operation may return the result of asking this isolation context state 157 to attempt to publish as of the specified timestamp.
Table 5
[00215] In accordance with example implementations, after the publish task is finished (within the initial publish() call that installed the task), a publish result object is created based on the publish time and conflict list stored in the task. The creation of the publish result object may be performed outside of the publish task. In accordance with example implementations, the publish result object contains an immutable reference to the isolation context object 151 that was published; an immutable timestamp representing the publish attempt time; and an immutable list of blocking conflicts 152, which is empty if the publish attempt succeeded.
[00216] A write task is installed to make a modification to an MSV object 166. It is noted that the modification may introduce a conflict in a publish task that might otherwise be occurring at the same time in a different thread. A write task may also be installed in an isolation context object 151 (in competition with one or multiple publish tasks) when it is determined that there is a possibility that the modification may induce a conflict.
[00217] In accordance with example implementations, a write task forces a serialization of all modifications of a given field, regardless of isolation contexts. In accordance with some implementations, some modifications may be made directly on the MSV state 162 associated with the MSV object 166 without installing corresponding write tasks. In this manner, such modifications may include those which cannot be affected by other modifications to the same MSV object 166 (e.g., operations to set a value in any view 120 and any modification operation in a view 120 associated with a snapshot isolation context 1 10) and cannot introduce conflicts 152 into isolation context objects 151 (e.g., operations in views 120 associated with non-snapshot isolation contexts 1 10 that have no child isolation contexts 1 10 and are either non-publishable or have no publishable sibling isolation contexts 1 10). [00218] The performance of a write task may entail the performance of one or more phases: a cache state phase, an install context events phase, an add value to value chain phase, and an add conflicts phase. A write task may contain an atomically-updated indication of the current phase being performed by some thread or an indication that all phases have been completed. A write task may contain an immutable reference to the MSV object 166 and may contain an immutable reference to the write value chain 156 to be modified. A write task may contain a Boolean value (initially false), which represents whether the write value chain 156 was modified. A write task may contain an immutable modification operation (e.g., an operation to set or add). A write task may contain an immutable argument to the operation. A write task may contain an immutable indication of whether the modification resolves conflicts. A write task may contain an atomically-updatable reference (initially null) to an MSV state 162. A write task may contain an atomically- updatable timestamp (initially null) that represents the start time of the operation. A write task may contain the value returned to be returned by the modification operation.
[00219] When a thread performs the write task, the thread reads the indication of the current phase, performs the associated action and attempts to update the current phase indication by replacing the indication of the phase performed with an indication of the next phase (or that the write task is complete if there is no next phase). If this fails, it may be inferred that another thread already recorded the completion of the phase and may have performed later phases. The thread continues processing based on the resulting phase indication (e.g., the phase indication set by the thread or by another thread) until the current phase indication indicates that the write task is complete.
[00220] In the cache state phase, the thread performs the following in order. First, the thread attempts to change the start time from zero to a value read from the event counter 164. If this attempt fails, then the start time has been set by another thread. Next, in the cache state phase, the thread makes an attempt to change the cached state from null to the MSV state 162 currently associated with the MSV object 166. If the attempt fails, it may be inferred that another thread made the change. This sequence ensures that publication of an isolation context object 151 after the recorded start time cannot invalidate the recorded MSV state 162.
[00221 ] In the install context events phase, the thread adds the write task to every isolation context object 151 for which the present modification could possibly create a conflict. The installed write task in a given isolation context 151 therefore causes a check for conflicts to be performed due to the installed write task before the given isolation context 151 may publish. . If there are is a publish task installed on such an isolation context object (indicating an ongoing publication), the thread assists in completing the publish task (i.e., the thread assists the ongoing
publication) before adding the write task, as described above. In accordance with example implementations, the thread adds the write task to every isolation context object 151 associated with an open publishable value chain 156 in the MSV state 166, including the write value chain 156, with the exception of those value chains 156 that are already associated with a conflict 152. An atomically-updatable integer indicating the next value chain 156 associated with the MSV state 162 to process may be used to ratchet the next slot to consider, allowing threads to avoid trying to add the write task to an isolation context object 151 that has already received it from another thread.
[00222] After installing the task on each isolation context object 151 , the thread checks to see whether the isolation context object 151 was published after the start time. If the thread determines that the isolation context object 151 was published after the start time, then the publish occurred after the creator of the task called the close_published_views() process. Therefore, in accordance with example
implementations, the thread calls the closed_published_views() process again on the cached MSV object 166 and updates the cached MSV state 162 to the result. In this call, an indication that the thread is already processing a write task may be passed, to prevent the close_published_views() process from clearing the pending write task on the MSV object 166. The task may also update the cached start time to the current timestamp (unless the cached start time was already greater due to the time being set by another thread after the current time was read). [00223] In the add value to value chain phase, a thread may perform the following operations:
1 . The value is computed based on the operation and argument and stored in the task. This computation may not be performed in an atomic manner, as an assumption may be made that repeated evaluations will yield the same value.
2. A decision is made as to whether a modification to the write value chain 156 is required. If the operation is merely to establish the current value associated with the value chain's 156 view 120 (e.g., pursuant to a frozen read operation) or if the operation is one that will necessarily leave the current value unchanged (e.g., an addition or subtraction of zero or a multiplication or division by one) the
modification may be deemed to be unnecessary. If this is the case, and if the operation is other than to retrieve the current value and either the write value chain 156 is empty (e.g., has no most recent value object 158) or the associated isolation context 1 10 was published after the last value added was added to the write value chain 156, the otherwise unnecessary modification may be deemed to be necessary.
3. If the modification is deemed to be necessary, an add_value() operation is called on the write value chain 156 to establish the value, and it is noted that a value was written. If it was initially determined that the modification was
unnecessary and this decision was reversed, the call to the add_value() operation may be specified to indicate in the write value chain 156 that the value object 158 added was the result of a frozen read.
4. If the write is to be treated as resolving a conflict and there is a conflict 152 associated with the write value chain 156, the conflict_resolved(conflict) operation of the associated isolation context object 151 is invoked, the conflict 152 is marked as having been resolved, and the conflict field of the write value chain 156 is cleared.
Table 6 [00224] In the add conflicts phase, the thread adds the actual and contingent conflicts and removes the write task from all isolation context objects 151 , as described below:
1 . If there was no modification, then it may be inferred that no conflicts 152 were introduced. The thread removes the write task (if present) from the isolation context objects 151 associated with all open publishable value chain 156 in the MSV state 162, and the phase is complete.
2. Otherwise, all open value chains 156 (whether or not they are publishable) are enumerated. This enumeration may use an atomically updated integer associated with the write task, in the manner used when adding the write task to isolation context objects 151 in order to allow threads to not redo work done by other threads. The current view object 153 and current isolation context object 151 associated with the current value chain 156 in the enumeration are noted.
3. If the current value chain 156 is the write value chain 156 or if the current value chain 156 was last modified prior to the last modification to the write value chain (if any), it is determined that no conflict can be introduced with respect to the current value chain 156 by this modification.
4. Otherwise, if the current isolation context object 151 is publishable, the current value chain 156 does not already have an associated conflict 152, the write view object 153 is an ancestor of the current view object 153, and it is not the case that both the current value chain 156 and the write value chain 156 indicate that their most recent value objects 158 were due to frozen reads, then a conflict 152 is added to the current value chain 156 and current isolation context object 153.
5. Otherwise, if both the write value chain 156 and current value chain 156 are publishable value chains 156, the current value chain 156 does not already have an associated conflict 152, the write view object 153 and current view object 153 have the same parent view object 153, and it is not the case that both the current value chain 156 and the write value chain 156 indicate that their most recent value objects 158 were due to frozen reads, then contingent conflicts 152 are added to both the write isolation context object 151 and current isolation context 151 , each contingent conflict 152 referring to the other isolation context object 151 .
6. Otherwise, if the write isolation context object 151 is a publishable
snapshot, the write value chain 156 does not already have an associated conflict 152, the write task is not noted to be resolving a conflict, the current view object
153 is an ancestor of the write view object 153, and the most recent value object 158 in the current value chain 156 does not have the same timestamp as the last merge timestamp for the current isolation context object 151 , then a conflict 152 is added to the write value chain 156 and write isolation context object 153.
7. Following the decision as to whether a conflict is induced with respect to the current value chain 156, the write task may be removed from the current isolation context object 151 unless that is the write isolation context object 151 .
Table 7
[00225] In accordance with example implementations, the system 100 may be a system 900 of one or multiple physical machines 910, as depicted in Fig. 9. The physical machine 910 is a processor-based machine that is constructed from actual machine executable instructions, or "software" 960 and actual hardware 920. The hardware 920 of the physical machine 910 may include, for example, one or multiple central processing cores 922 (e.g., central processing cores (CPU) and/or graphics processing unit (GPU) cores), a memory 924, one or multiple network interfaces 926, one or multiple mass storage devices 927, a display, input/output (I/O) devices and so forth.
[00226] In accordance with example implementations, the memory 924, in general, may be a non-transitory memory, which includes non-transitory memory storage devices, such as semiconductor memory devices, phase change memory devices, random access memory (RAM) devices, dynamic RAM (DRAM) devices, resistive memory devices, flash memory devices, a combination of one or more of these devices, and so forth.
[00227] In accordance with example implementations, the machine executable instructions 960 may be stored in a non-transitory computer-readable storage medium, such as the memory 924, for example. The instructions 960, when executed by one or multiple of the processing cores 922, may cause the processing core(s) 922 to execute one or multiple applications 966, i.e., execute the program code 1 12 as part of one or multiple operating system processes 964. One or multiple processes 964 may share a given isolation context, as discussed herein. In addition to the application(s) 966, the machine executable instructions 960 may include an operating system 965, a virtual machine monitor (VMM) 969, or hypervisor, as well as program instructions 968 that, when executed by the processing core(s) 922 cause the core(s) 922 to provide the conflict resolver engine 140 (see Figs. 1A and 8).
[00228] The physical machine 920 may store, in accordance with example implementations, data 970, such as data 972 for the data structures 106 (see Fig. 1A), data 974 for the management resources 130 (see Fig. 1A), and so forth.
[00229] As also depicted in Fig. 9, the system 900 may include a high speed interconnect 980 (a server rack backplane, a server cabinet backplane, a bus, a serial link, and so forth) that interconnects multiple physical machines 910. Although two physical machines 910 are depicted in Fig. 9, it is noted that the system 900 may include three or more physical machines. Moreover, in accordance with further example implementations, the system 900 may include a single physical machine 910. For implementations in which the system 900 includes multiple physical machines 910, the machines 910 may be disposed at a single physical location (or facility) or may be geographically distributed at multiple locations.
[00230] While the present invention has been described with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.

Claims

What is claimed is 1 . A method comprising:
executing machine executable instructions in a first isolation context to run a plurality of tasks to modify data according to a first view associated with the first isolation context;
attempting to publish modifications made in a first isolation context so that the modifications to the data are visible in a second isolation context having an associated second view isolated from the first view; and
in response to the attempted publication, applying task-based conflict resolution to resolve at least one conflict occurring due to the first and second views, wherein applying the task-based conflict resolution comprises executing machine executable instructions to selectively re-run a subset of tasks of the plurality of tasks, wherein the number of tasks of the subset of tasks is less than the number of tasks of the plurality of tasks.
2. The method of claim 1 , wherein applying the task-based conflict resolution comprises:
monitoring the execution of the instructions to run the plurality of tasks to determine a dependency graph among tasks of the plurality of tasks; and
identifying the subset of tasks to re-run based at least in part on the
determined dependency graph.
3. The method of claim 2, wherein monitoring the execution of the instructions to run the plurality of tasks to determine the dependency graph comprises tracking read and write accesses to the data structure.
4. The method of claim 2, wherein determining the dependency graph comprises determining a static dependency graph before the re-running of the subtasks.
5. The method of claim 2, wherein determining the dependency graph comprises dynamically updating the dependency graph during the re-running of subtasks and dynamically updating the identified subset of tasks during the re- running of the subtasks.
6. The method of claim 1 , wherein:
the at least one conflict comprise a plurality of conflicts associated with a plurality of locations of at least one data structure; and
applying the task-based conflict resolution comprises:
determining a subset of locations of the plurality of locations to reset to store values shared in common among the first and second views;
resetting the locations to store the values in the identified subset of locations; and
re-running tasks of the subset of tasks after the resetting of the values.
7. The method of claim 1 , wherein applying the task-based conflict resolution comprises:
identifying the subset of tasks to re-run based at least in part on programming language code associated with the machine executable instruction identifying a given task of the subset of tasks as being an unconditionally re-runnable task.
8. The method of claim 1 , wherein applying the task-based conflict resolution comprises:
determining a partial order among tasks of the subset of tasks; and
scheduling re-running of tasks of the subset of tasks based at least in part on the partial order.
9. The method of claim 8, wherein scheduling re-running of tasks comprises:
determining whether tasks of the subset of tasks are to be re-run in parallel execution paths.
10. The method of claim 1 , further comprising:
retrying publication of the modifications; and
selectively retrying the task-based conflict resolution based at least in part on a result of the retried publication.
1 1 . A system comprising:
a memory to store data representing conflicts preventing publication of a first isolation context associated with a plurality of tasks;
at least one data structure associated with data modified by the plurality of tasks; and
a conflict resolver engine comprising a processor to:
monitor execution of the plurality of tasks to construct a dependency graph representing interdependencies of tasks of the plurality of tasks; and
in response to attempted publication of the first isolation context, selectively identify a subset of tasks of the plurality of tasks to be re-run to resolve at least some of the conflicts based at least in part on the dependency graph, wherein the number of tasks of the subset of tasks is less than the number of tasks of the plurality of tasks.
12. The system of claim 1 1 , wherein:
publication of the first isolation context merges a first view associated with the first isolation context with a second view associated with a second isolation context; the conflict resolver engine determines locations of the at least one data structure to reset to store values shared in common among the first and second views; and
the conflict resolver engine resets the locations to store the values in the identified subset of locations before the re-running of at least some of the tasks of the subset of tasks.
13. The system of claim 1 1 , wherein:
the conflict resolver engine determines a partial order among tasks of the subset of tasks; and
the conflict resolver engine schedules re-running of tasks of the subset of tasks based at least in part on the partial order.
14. An article comprising a non-transitory computer readable storage medium to store instructions that when executed by a computer cause the computer to:
execute a plurality of tasks to modify data according to a first isolation context; identify a conflict preventing the first isolation context from publishing, wherein the publication merges a first view associated with the first isolation context with a second view associated with a second isolation context; and
apply task-based conflict resolution to resolve the conflict, including
selectively executing a subset of at least one task of the plurality of tasks, wherein the number of tasks or tasks of the subset of at least one task is less than the number of tasks of the plurality of tasks.
15. The article of claim 14, the storage medium storing instructions that when executed by the computer cause the computer to:
retry publication of the first isolation context; and
selectively retry the task-based conflict resolution based at least in part on a result of the retried publication.
PCT/US2015/063044 2015-11-30 2015-11-30 Task-based conflict resolution WO2017095384A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/063044 WO2017095384A1 (en) 2015-11-30 2015-11-30 Task-based conflict resolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/063044 WO2017095384A1 (en) 2015-11-30 2015-11-30 Task-based conflict resolution

Publications (1)

Publication Number Publication Date
WO2017095384A1 true WO2017095384A1 (en) 2017-06-08

Family

ID=58797661

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/063044 WO2017095384A1 (en) 2015-11-30 2015-11-30 Task-based conflict resolution

Country Status (1)

Country Link
WO (1) WO2017095384A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050055179A1 (en) * 2003-09-05 2005-03-10 Elizabeth Matheson Probabilistic scheduling
US20090089552A1 (en) * 2004-03-08 2009-04-02 Ab Initio Software Llc Dependency Graph Parameter Scoping
US20100131587A1 (en) * 2008-11-26 2010-05-27 Microsoft Corporation Minimizing Conflicts When Synchronizing Interrelated Data Between Two Systems
US20110078691A1 (en) * 2009-09-30 2011-03-31 Microsoft Corporation Structured task hierarchy for a parallel runtime
US20140007105A1 (en) * 2012-06-29 2014-01-02 Nokia Corporation Method and apparatus for a task based operating framework

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050055179A1 (en) * 2003-09-05 2005-03-10 Elizabeth Matheson Probabilistic scheduling
US20090089552A1 (en) * 2004-03-08 2009-04-02 Ab Initio Software Llc Dependency Graph Parameter Scoping
US20100131587A1 (en) * 2008-11-26 2010-05-27 Microsoft Corporation Minimizing Conflicts When Synchronizing Interrelated Data Between Two Systems
US20110078691A1 (en) * 2009-09-30 2011-03-31 Microsoft Corporation Structured task hierarchy for a parallel runtime
US20140007105A1 (en) * 2012-06-29 2014-01-02 Nokia Corporation Method and apparatus for a task based operating framework

Similar Documents

Publication Publication Date Title
Bronson et al. A practical concurrent binary search tree
Fernandes et al. Lock-free and scalable multi-version software transactional memory
Dimitrov et al. Commutativity race detection
US9619281B2 (en) Systems and methods for adaptive integration of hardware and software lock elision techniques
Feldman et al. A wait-free multi-word compare-and-swap operation
Harris et al. Feedback directed implicit parallelism
Lebanoff et al. Check-wait-pounce: Increasing transactional data structure throughput by delaying transactions
Wang et al. Eunomia: Scaling concurrent search trees under contention using htm
Xiang et al. Compiler aided manual speculation for high performance concurrent data structures
US9766926B2 (en) Method and system for optimizing parallel program execution based on speculation that an object written to is not shared
Kuchumov et al. Staccato: shared-memory work-stealing task scheduler with cache-aware memory management
Barnat et al. Fast, dynamically-sized concurrent hash table
Yi et al. A Universal Construction to implement Concurrent Data Structure for NUMA-muticore
Gao et al. Lock-free parallel and concurrent garbage collection by mark&sweep
Nieman Issues in the Design and Control of Parallel Rule-Firing Production Systems
WO2017095384A1 (en) Task-based conflict resolution
Jenke et al. Mapping High-Level Concurrency from OpenMP and MPI to ThreadSanitizer Fibers
Chaudhary et al. Achieving starvation-freedom in multi-version transactional memory systems
Zhang et al. Eunomia: Scaling concurrent index structures under contention using HTM
WO2017095388A1 (en) Managing an isolation context
WO2017095387A1 (en) Multiple simultaneous value object
Howard Extending relativistic programming to multiple writers
Bronson Composable operations on high-performance concurrent collections
Carpen-Amarie et al. Towards an Efficient Pauseless Java GC with Selective HTM-Based Access Barriers
de Putter et al. Lock and fence when needed: state space exploration+ static analysis= improved fence and lock insertion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15909901

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15909901

Country of ref document: EP

Kind code of ref document: A1