WO2017095388A1 - Managing an isolation context - Google Patents

Managing an isolation context Download PDF

Info

Publication number
WO2017095388A1
WO2017095388A1 PCT/US2015/063055 US2015063055W WO2017095388A1 WO 2017095388 A1 WO2017095388 A1 WO 2017095388A1 US 2015063055 W US2015063055 W US 2015063055W WO 2017095388 A1 WO2017095388 A1 WO 2017095388A1
Authority
WO
WIPO (PCT)
Prior art keywords
isolation context
value
isolation
context
view
Prior art date
Application number
PCT/US2015/063055
Other languages
French (fr)
Inventor
Evan R. Kirshenbaum
Susan D. Spence
Original Assignee
Hewlett-Packard Enterprise Development LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Enterprise Development LP filed Critical Hewlett-Packard Enterprise Development LP
Priority to PCT/US2015/063055 priority Critical patent/WO2017095388A1/en
Publication of WO2017095388A1 publication Critical patent/WO2017095388A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control

Definitions

  • a computer system may have a memory that is shared by multiple computing entities (multiple threads, for example).
  • the computing entities may concurrently perform computations that change the values that are stored in the shared memory.
  • One way to control the concurrent processing by the computing entities is to organize the changes by the entities into transactions and atomically commit the transactions to memory in a manner that maintains the memory in a consistent state.
  • Fig 1 A is a schematic diagram of a system according to an example implementation.
  • Fig. 1 B is an illustration of data structures to manage multiple simultaneous value (MSV) objects according to an example implementation.
  • FIG. 2 is an illustration of a hierarchical ordering of isolation contexts according to an example implementation.
  • Fig. 3A illustrates the relationship of a parent isolation context and a live child isolation context created from the parent isolation context according to an example implementation.
  • Fig. 3B illustrates the relationship of a parent isolation context and a snapshot child isolation context created from the parent isolation context according to an example implementation.
  • FIGs. 4A and 4B are flow diagrams illustrating techniques to manage publication of an isolation context according to example implementations.
  • FIG. 5 illustrates a schematic diagram of a system of physical machines according to a further example implementation.
  • Multiple threads executing in one or more processes of a computer or multiple computers may perform operations that are directed to one or multiple shared data structures.
  • One approach to maintain a consistent state of the data structure is to block other threads from making changes to the data structure while one of the threads makes changes. This may, however, result in inefficient processing.
  • Another approach to maintain a consistent state of the data structure is for the threads to process their changes as transactions, which are atomically committed to memory or rolled back (e.g., their modifications discarded) upon discovery that changes made by another thread conflict with the changes attempting to be committed.
  • Such an approach may present challenges for relatively large
  • transactions e.g., transactions that read or modify a large number of locations associated with the data structure or transactions that run for a long time before attempting to commit
  • the probability that no other thread, during the course of the transaction's execution, made a modification that results in a conflict that prevents the commit attempt from being successful may be relatively small.
  • a second attempt (and subsequent attempts) at redoing the changes following a rollback and retrying the committing the transaction may once again fail.
  • an "isolation context” also called a “computational context” herein refers to an environment in which computations that are performed within the environment are contained within the environment so that the results of the computations are not, in general, visible to other isolation contexts. Due to the computational isolation, machine executable instructions, or program code, that is executing within the isolation contexts may concurrently make modifications to a data structure that is shared by the contexts.
  • a "data structure” refers to an organization of one or multiple units of data, which are stored in one or multiple storage locations.
  • an isolation context may be used to access (e.g., read or modify) multiple data structures.
  • Each isolation context may present to program code a corresponding "view" of the data structure, where the "view” refers to the value(s) that the isolation context reads for corresponding properties of the data structure.
  • a single isolation context may be associated with multiple views.
  • a "property” may be a location associated with the data structure (e.g., a field within a record or an index of an array), a structural property of the data structure (e.g., a number of elements in a list), or a relationship that is associated with the data structure (e.g., an association of a value with a key in a map).
  • isolation contexts create computational isolation, there are mechanisms by which an isolation context may transfer information to another isolation context.
  • One way a first isolation context may transfer information to a second isolation context is for the first isolation context to publish.
  • "Publishing" an isolation context refers to combining or merging the view of the publishing isolation context with the view of another isolation context so that the views are the same at the time of publication.
  • a given publication attempt may not succeed due to one or multiple publication conflicts.
  • a publication conflict also called a "conflict" herein refers to a reason for the publication not to occur.
  • An example of a conflict is the existence a modification made to a data structure (e.g., a change to a field of a record) within the isolation context that is the target of the publication attempt when a similar modification (e.g., a change to the same field of the same record) was made within the isolation context being published or when a similar value (e.g., the value of the same field of the same record) was read within the isolation context being published and a modification to the same data structure or another data structure was made that may have been based on the value that was read.
  • a similar modification e.g., a change to the same field of the same record
  • a processor-based system 100 includes context management resources 130, which the system 100 uses to create and manage isolation contexts 1 10 (N isolation contexts 1 10-1 , 1 10-2...1 10-N being depicted in Fig. 1A), and manage publications by the isolation contexts 1 10.
  • an isolation context is a mechanism by which executing machine executable instructions (called “program code 1 12," herein) may isolate itself from other executing program code 1 12.
  • the program code 1 12 may be an entire application or program or may be part of such an application or program, such as a thread.
  • program code 1 12 may be working in a prevailing (also referred to herein as the "current" or “working") isolation context 1 10.
  • program code 1 12-1 may be working in a prevailing isolation context 1 10-1 .
  • each thread may have its own prevailing isolation context 1 10, and different threads executing program code 1 12 at the same time may be working in different prevailing isolation contexts 1 10.
  • the prevailing isolation context 1 10 in the child thread when it begins executing is that of the parent thread (e.g., at the time the child thread was created).
  • References herein to program code 1 12 "executing in" a given isolation context 1 10 refer to the program code 1 12 executing while the given isolation context 1 10 is the prevailing isolation context.
  • a sequence of machine executable instructions (called “thread A” for this example) of program code 1 12 may be executing in one of the isolation contexts 1 10, and another sequence of machine executable instructions (called “thread B” for this example) of program code 1 12 may be executing in another one of the isolation contexts 1 10.
  • threads A and B may share a data structure 106.
  • changes made to data by thread A to the data structure 106 may be invisible to thread B, and changes made to the data structure 106 by thread B may be invisible to thread A. That is, different computations, working at the same time and looking at the same fields of the same records, may correctly see different values.
  • each isolation context 1 10 has an associated view 120 (or multiple views 120) to a given data structure 106; and as such, multiple isolation contexts 1 10 may have different associated views 120 of the same data structure 106.
  • Multiple operating system threads may work in the same isolation context 1 10, in accordance with example implementations.
  • the multiple operating system threads may be associated with multiple operating system processes and the multiple operating system processes may be associated with multiple computers. This allows sharing of the same view 120 among multiple processes and computers. Moreover, this arrangement allows processes that share an isolation context 1 10 to be written in different programming languages.
  • the data structures 106 may be stored in a data store 104.
  • the data store 104 may include a namespace that associates names with the data structures 106.
  • An example of a namespace is a file system, in which data structures 106 are stored as files that are identified by corresponding file names.
  • Another example of a namespace is a key- value store, which stores associated key-value pairs. In this manner, a key may be used to find a respective value that is stored in the key-value store.
  • the data store 104 may be stored in a physical storage device (a volatile or non-volatile memory device, a hard disk device, and so forth) or may be stored across a distributed arrangement of physical storage devices.
  • a "data structure 106" refers to any unit of data that may be stored. Examples of data structures 106 include files, records, lists, sets, maps, tables, arrays, strings, queues, stacks, graphs, directories, primitives (a number, a Boolean value, a character, as examples), and so forth.
  • a data structure 106 may also include a data structure included by reference (e.g., a pointer).
  • the context management resources 130 may include one or multiple libraries 134, with each library 134 including one or multiple functions 136 that may be called by the program code 1 12 for such purposes as creating isolation contexts, establishing views for isolation contexts, binding functions to isolation contexts, identifying conflicts, resolving conflicts, publishing contexts, as so forth, as further described herein.
  • the context management resources 130 may contain one or multiple objects, which may also be used for purposes of creating, maintaining, and managing the isolation contexts 1 10, as described herein.
  • the isolation contexts 1 10 are fully hierarchical, as illustrated by an example hierarchical tree 200. In this manner, multiple isolation contexts 1 10 may form a tree that is rooted at a top-level, global isolation context 1 10, and a given isolation context 1 10 may have any number of children. For the example of Fig.
  • isolation context 1 10-3 is a global context that is a parent of isolation contexts 1 10-4 and 1 10-5 and grandparent of isolations contexts 1 10-6 and 1 10-7; isolation contexts 1 10-4 and 1 10-5 are siblings; isolation contexts 1 10-6 and 1 10-7 are children of parent isolation context 1 10-4; and isolation contexts 1 10-6 and 1 10-7 are siblings.
  • the isolation context hierarchy may extend to any depth.
  • isolation context 1 10 One way for information to travel from one isolation context 1 10 to another isolation context 1 10 is for a child isolation context 1 10 to successfully publish, thereby making the changes visible in the parent isolation context 1 10.
  • isolation context P is a parent of isolation context C
  • changes made within isolation context P are generally visible in the isolation context C, but changes made in the isolation context C are not visible in the isolation context P until the isolation context C is successfully published.
  • the child isolation context 1 10 may have two forms, which are specified when the child isolation context 1 10 is created: a transparent, or live, isolation context; and an opaque, or snapshot isolation context.
  • a live child isolation context such as example live child isolation context 1 10-9 of Fig. 3A
  • changes made to a location L such as location 304 in Fig. 3A
  • parent such as isolation context 1 10-8 of Fig. 3A
  • the changes made in the parent isolation context may actually be changes that are made to an ancestor of the parent isolation context but are visible in the parent isolation context; and the changes may be due to a different child isolation context of the parent isolation context publishing its changes to the parent isolation context.
  • a snapshot child isolation context such as example snapshot child isolation context 1 10-1 1 of Fig. 3B
  • no changes that are made in the parent isolation context such as parent isolation context 1 10-10 of Fig. 3B) after the snapshot child isolation context is created are visible within the child isolation context.
  • a live child isolation context C reads made on locations that have not been modified in the context C return, as stated above, the value that the location would have had, had the read been made in the parent isolation context P. Either at the time of the read or by setting a default (at or after the time isolation context C was created or as a general policy in the process), the read may be specified as being "frozen.”
  • a frozen read has the property that, once it occurs, all subsequent reads of that location within isolation context C (frozen or not) return the same value until the location is modified within isolation context C or until isolation context C is successfully published.
  • the system may determine a value as of a current time. More generally, to determine a value as of a given time (e.g., a request time), the system may attempt to determine whether a value associated with the property had been established (i.e., previously determined) in the view 120 prior to the request time and subsequent to the later of the time that the isolation context 1 10 was created or the time the isolation context 1 10 was last successfully published prior to the request time.
  • a value associated with the property had been established (i.e., previously determined) in the view 120 prior to the request time and subsequent to the later of the time that the isolation context 1 10 was created or the time the isolation context 1 10 was last successfully published prior to the request time.
  • Such a value may have been established by modifications to the property by program code 1 12 executing in the isolation context 1 10, by frozen reads to the property by program code 1 12 executing the isolation context 1 10, or by successful publication of a value for the property from a child isolation context 1 10 of the isolation context 1 10. If such a value was established, then it is the determined value. Otherwise, the determined value may be found by determining an inherited value for the property, where the inherited value is the value associated with the property within a parent view 120 of the view 120 associated with the parent isolation context 1 10 of the isolation context 1 10 as of an effective time based on the view 120 and the request time. As an example, when the isolation context 1 10 associated with the view 120 is a live (i.e.
  • the effective time may be the request time.
  • the isolation context 1 10 associated with the view 120 is a snapshot isolation context 1 10
  • the effective time may be the later of the creation time of the snapshot isolation context 1 10 and a time associated with the latest successful publication of the snapshot isolation context 1 10 prior to the request time.
  • an isolation context 1 10 may invoke a call operation, which takes a function 136 (see Fig. 1A) as an argument (and perhaps takes other arguments to pass to the function), temporarily sets the prevailing isolation context to another context, runs the function 136, and returns the result.
  • the result of the call operation remains the value as it appears in the invoked isolation context 1 10.
  • the count may have had a value of "10,” but calling the function f sets the count to 20.
  • the two variables r and x refer to the same record, but the view presented through the variable r is that of the isolation context C, while the view presented through the variable x is that of the isolation context D.
  • r.getCountQ continues to return "10,” but x.getCount() returns "20.” And if something elsewhere caused a computation in the isolation context D to set the count on the same record r to 30, further evaluation in the isolation context C causes x.getCount() to return "30.”
  • the view presented through the variable x is neither that of isolation context C nor isolation context D but a composite "C's view of D's view" associated with isolation context C. If, within isolation context C, the count of the record accessed via x is either modified or accessed via a frozen read, subsequent reads of the count in the isolation context C via variable x result in the value established within isolation context C and ignore any intervening modifications made within isolation context D. This distinction remains until isolation context C is successfully published, at which point D's view and C's view of D's view are again identical. In this way, an isolation context may have multiple associated views 120.
  • program code in an isolation context 1 10 may also bind a function to an isolation context (e.g., the prevailing isolation context 1 10 or a different isolation context 1 10), returning a new function 136 which, when invoked, runs the original function 136 in the bound isolation context 1 10 (e.g., by invoking the call operation on the bound context and passing in the original function 136 as a parameter).
  • This binding operation may be used to create a function 136 that is bound to the current isolation context 1 10 and that can be invoked in a child isolation context 1 10 (or another isolation context 1 10 via the call operation).
  • the binding operation may be used to create several new functions by the same function 136 to a number of different contexts 1 10.
  • These multiple isolation contexts 1 10 may be snapshot contexts 1 10 representing the state of the world at various times in the past (e.g., snapshots of a company taken at daily intervals). Alternatively, these multiple isolation contexts 1 10 may be child isolation contexts 1 10 used to explore and select among different alternative approaches to solving a problem. A function bound to the current isolation context 1 10 may be provided to a different context (e.g., by the call operation) to allow the different context to observe and store data in the bound isolation context 1 10.
  • Child isolation contexts 1 10 can obtain references to their parent isolation contexts 1 10, so even if there has been a modification or frozen read in a given child isolation context 1 10, the program code 1 12 of a child isolation context 1 10 may, in accordance with example implementations, invoke the call operation on its parent isolation context 1 10 to determine what the value of a particular location is in the view 120 of the parent isolation context 1 10. For example, to obtain the count field of a record "r" as it would be seen in the parent isolation context 1 10 of the current isolation context 1 10, a program written in Java might call: lsolationContext.current().parent().call(() -> r.getCountQ)
  • a function may be called on the global isolation context 100, for example: lsolationContext.global().call(() -> revenue) to obtain the committed value of a variable.
  • lsolationContext.global().call(() -> revenue) to obtain the committed value of a variable.
  • such a call may be used to set a default value that is seen by code working in unpublished isolation contexts 1 10.
  • Similar manipulations via the global isolation context 1 10 may be used to manipulate the namespace in a committed manner from within an unpublished isolation context.
  • isolation contexts 1 10 may pass out values without publishing modifications, such values may be accumulated.
  • program code 1 12 executing in an isolation context 1 10 may take a snapshot of the state of a company database periodically (e.g., every day, hour, minute or second) and make each of these snapshots available by binding the snapshots to names within a namespace.
  • the executing program code 1 12 may append the snapshots to a single list, facilitating, for example, identifying the states of the database for the ten days that scored highest on some metric (e.g., the days during which the revenue of the company was highest).
  • program code 1 12 executing in an isolation context 1 10 may run, in parallel child isolation contexts 1 10, a series of potential modifications according to models with different parameters, and place the results (indicating predicted consequences) computed in each child isolation context 1 10 (along with parameters used with the respective models) in a single structure. Then, all of the results may be compared with one another and the child isolation context 1 10 that resulted in the best (and only the best) outcome allowed to publish its results.
  • program code 1 12 executing in an isolation context 1 10 may run, in parallel child isolation contexts 1 10, a series of non-deterministic simulations and collect the results produced within each simulation into a structure, and analyze the structure to determine, based on the results, an action to take.
  • the child isolation contexts 1 10 may be allowed to terminate without ever publishing their modifications to their parent isolation contexts 1 10.
  • program code 1 12 may execute code 1 12 in an arbitrary isolation context 1 10 by explicitly changing the prevailing isolation context 1 10 to be the arbitrary isolation context 1 10.
  • Such a change may be temporary and bounded to particular region of code 1 12 (e.g., a particular function or program block) by ensuring that at the end of the region of code 1 12 the prevailing isolation context is reverted to what it had been before the change.
  • isolation contexts control some, but not all variables or other memory locations available to the program.
  • program code 1 12 executing in the temporary prevailing isolation context 1 10 may obtain a value, such as a reference to a data structure, where the reference is associated with a view 120 associated with the temporary prevailing isolation context 1 10, and may store that value in a location that is not under control of the isolation contexts.
  • program code executing in the former prevailing isolation context 1 10 may obtain the value associated with the view 120 associated with the temporary prevailing isolation context 1 10.
  • isolation contexts 1 10 Another way that information may move between isolation contexts 1 10 is when a child isolation context 1 10 publishes its
  • the program may specify, when an isolation context 1 10 is created, that the isolation context 1 10 is "detached," which means that the isolation context 1 10 may not be published.
  • An error may be signaled (e.g., an exception may be thrown) if an attempt is made to publish a detached isolation context 1 10.
  • a conflict may be associated with a property associated with a data structure, where a property may be a location (e.g., a particular field of a particular record or a particular index of a particular array), a relationship managed by the data structure (e.g., the value associated with a particular key in a particular map), or a structural property of the data structure (e.g., the order of elements of a particular list or the number of elements contained in a particular set).
  • a conflict may arise due to the existence of an
  • An unpublished value may be a value asserted to be associated with a view that has not yet been processed in response to a publication of an isolation context
  • a conflict may arise when the value associated with such a property is changed within the parent isolation context 1 10 of the isolation context 1 10 requesting publication.
  • a conflict may arise when a new value otherwise becomes visible in the parent isolation context 1 10, when such a value change occurs after a particular effective time and when the child (e.g., publishing) isolation context either modified the same property (where such modification includes receiving a value due to the successful publication by a further child isolation context 1 10 of the publishing isolation context 1 10) or when the publishing isolation context performs a frozen read operation to obtain a value associated with the property.
  • the effective time associated with a property relative to the publishing isolation context 1 10 may be established and updated upon creation of the publishing isolation context 1 10, each time the publishing isolation context 1 10 is successfully published, and upon an explicit indication (e.g., during an attempt to resolve conflicts due to an unsuccessful publication attempt) by program code 1 12 executing in the publishing isolation context 1 10 that any conflicts associated with the property have been resolved.
  • the effective time may be updated when the publishing isolation context 1 10 is established as the prevailing isolation context 1 10.
  • the effective time is also updated the first time a modification to or frozen read of the property is performed following such a creation, successful publication, or explicit indication.
  • an "atomic" process means that the actions that form the process are treated as being indivisible, i.e., the actions are viewed or treated by the isolation contexts as occurring at the same time.
  • a thread cannot see some, but not all, of the published modifications in the parent isolation context, and a modification cannot be made in any isolation context that would change the determination that there are no conflicts, between the determination that there are no conflicts and the making available of the
  • the conflict(s) may be determined proactively so that the conflicts are known at the time the isolation context 1 10 requests publication.
  • the isolation context 1 10 and its parent isolation context 1 10 present the same values for the data structure 106, and the child isolation context 1 10 is no longer considered to have established any current values for the data structure 106.
  • attempts to determine values for properties of the data structure 106 within the child isolation context 1 10 results in determining inherited values from the parent isolation context 1 10.
  • this has the effect that locations that had been "frozen” by modifications or by frozen reads are no longer considered frozen and the inherited values may vary as modifications are made to the parent context; and for a snapshot isolation context 1 10, the "snapshot time" (i.e.
  • a last-common-snapshot for a given isolation context 1 10 may be obtained by calling the appropriate function 136.
  • the last-common-snapshot is a read-only snapshot child of the context's parent as of the last time the isolation context 1 10 was published (or its creation time if it has not been published). This allows an isolation context 1 10 to compare its changes to a data structure 106 with the values of the data structure 106 when the isolation context 1 10 started or was last successfully published.
  • an as-created-snapshot for a given isolation context 1 10 may be obtained by calling the appropriate function 136. [0037] If there are one or multiple conflicts, then publication of the child isolation context 1 10 does not occur, and all of the changes, or modifications, that are made by the child isolation context 1 10 remain invisible to the parent isolation context 1 10.
  • a conflict resolution phase is entered for purposes of taking one or multiple actions to resolve the conflicts.
  • the conflict resolution phase may be handled in a number of different ways, depending on the particular implementation.
  • the program code 1 12 associated with the isolation context 1 10 attempting publication decides how to handle the conflicts and may decide to re-attempt the publication.
  • the program code 1 12 marks each conflict as being resolved so that the conflict does not show up again for a subsequent publication attempt.
  • a given conflict may be resolved by modifying a value of the property associated with the conflict and specifying that such a modification resolves any conflict for that property.
  • the modification may be made by computing and setting a specific value.
  • the modification may be one of the following.
  • the modification may be a resolve-to-current modification in which the value in the publishing context is the one that is used. This is often the correct answer when it can be determined that the value asserted by the child isolation context 1 10 is not dependent on any conflicted value (i.e., if the program code 1 12 of the publishing isolation context 1 10 was executed again, the same result would occur).
  • the modification may be a resolve-to- parent modification in which the current value in the parent isolation context 1 10 is the one that is used.
  • the modification may be a roll-back modification in which the value in the last common snapshot is the one that is used.
  • determining a value for a property associated with a conflict includes determining a lack of an established value associated with the property in the view of the child isolation context 1 10; determining an inherited value for the property, where the inherited value is the value for the property in the view of the parent isolation context as of an effective time associated with the child's view and the request time; and considering the inherited value to be the current value.
  • the effective time may be the request time; and identifying the conflict includes identifying a conflict associated with the property due to the establishment of a value associated with the property in the parent's view subsequent to the establishment of a value associated with the property in the child's view.
  • the effective time may be the latter of a creation time of the first isolation context and a publication time of the first isolation context prior to the request time; and identifying the conflict includes identifying a conflict associated with the property due to the establishment of a value associated with the property in the parent's view
  • a given location may be subsequently modified (after having been marked as resolved) before the publication is reattempted. Moreover, marking a conflict as being resolved may cause the system to disregard any known conflict associated with the location and the isolation context 1 10, but one or multiple subsequent modifications in the parent or child isolation contexts 1 10 may introduce one or multiple new conflicts, which arise when the isolation context 1 10 tries again to publish.
  • program code 1 12 performing conflict resolution may make use of the current (e.g., publishing) isolation context 1 10, the parent isolation context 1 10, and the isolation context's 1 10 last-common-snapshot isolation context 1 10.
  • the program code 1 12 may also make use of a current-at-publish isolation context 1 10 and a parent-at-publish isolation context 1 10, which are read-only snapshots of the current (publishing) context and its parent at the time the publication was attempted (and failed).
  • one or multiple mechanisms may be used to encapsulate the process of creating an isolation context 1 10 as a child of the prevailing isolation context 1 10; running the code in it, and (when the created isolation context 1 10 is publishable) attempting to publish the created isolation context 1 10 at the end.
  • the mechanism may involve keywords, annotations, or other syntactic additions to the source code of the program and the program code may be specified directly.
  • the mechanism may involve calling a function (e.g., one of the functions 136 of Fig. 1A) and passing as an argument an indication of a function to be called within the newly created isolation context 1 10.
  • the program code 1 12 may specify (e.g., by choice of keyword, annotation, or function or by passing in a parameter) the type of child isolation context 1 10 to create (e.g., live or snapshot, detached or not, read-only or not).
  • the program code 1 12 may further specify information used to control behavior upon failure of an attempt to publish. Such information may cause the system to attempt conflict resolution and may control the manner in which conflict resolution is performed. The information may alternatively or in addition direct the system to react to a failure to publish or a failure to resolve conflicts by creating a new child isolation context 1 10 and rerunning the code within it.
  • the information may include one or more termination conditions, which the system may use to determine that further attempts to perform conflict resolution or to rerun the code should not occur.
  • termination conditions may include a given number of attempts having been performed, a given time (e.g., a wall clock time or a time duration) having been passed, a given amount of a resource (e.g., disk space or memory) having been consumed, a value having been asserted by another program thread (e.g., an indication that a solution to a problem has been found by another thread or an indication that a program has gone on to a different phase), a given function returning a true value, a Boolean
  • the conflict resolution may be handled in a number of different ways.
  • the program code 1 12 may decide that resolving the conflict is not worth the effort, and as a result, the isolation context 1 10 does not attempt to republish.
  • the program code 1 12 may be relatively small (online transaction processing (OLTP) code, for example); and as such, the approach may be to simply th row away the associated isolation context 1 10, create a new isolation context 1 10 and execute the program code 1 12 again from the beginning.
  • OLTP online transaction processing
  • the conflict resolution may be handled by an object or by a function 136 (see Fig. 1A) that is designated as a conflict resolver at the time the code 1 12 modified the location that was conflicted, such designation indicating that should a conflict be detected associated with this location and value, the object or function should be invoked to resolve the conflict.
  • the program code 1 12 may specify a default resolver (e.g., an object or function ) that is associated with a particular location (e.g., a particular field in a particular record), with a particular field (regardless of what record a conflict occurs in), or a type (e.g., applying to conflicts associated with any property of any data structure as long as the property is associated with values of the given type or applying to conflicts associated with any field of any record as long as the record is of the given type).
  • a default resolver e.g., an object or function
  • a particular location e.g., a particular field in a particular record
  • a particular field regardless of what record a conflict occurs in
  • a type e.g., applying to conflicts associated with any property of any data structure as long as the property is associated with values of the given type or applying to conflicts associated with any field of any record as long as the record is of the given type.
  • This type of conflict resolution may be appropriate when a field is, for example, known to be
  • a resolver may be attached to the field to perform this computation and resolve the conflict by specifying the resulting value.
  • the program code 1 12 may use task-based conflict resolution.
  • task-based conflict resolution the program code 1 12 specifies that some or all of its computation is made up of re-runnable tasks that are executed by the program code 1 12.
  • the system 100 may keep track of the set of locations read and written while working in each task and the dependencies between tasks (e.g., a dependency create where one task reads a location that was written by another task).
  • all tasks that read a conflicting location are selected to rerun, as are all tasks dependent on them (and so on, recursively).
  • conflicted locations may be resolved to the parent value, the current value, or determined value prior to, during, or after selected tasks are re-run. The selected tasks may then re-run in dependency order and the publication may then be retried.
  • the program code 1 12 may provide a function that takes a collection of conflict objects and runs arbitrary code to determine how to resolve the conflicts explicitly.
  • the program code 1 12 may use a combination of the above-described approaches, even for a single publication attempt.
  • the program code 1 12 may use field-based rules to resolve (and eliminate) some conflicts and then invoke one or multiple functions 136, objects and/or tasks to resolve the remaining conflicts.
  • a technique 400 includes preventing (block 404) a first modification made to a data structure within a first isolation context from being viewed by a second isolation context prior to publication of the first isolation context.
  • the technique 400 includes managing (block 406) the publication of the first isolation context, which includes identifying (block 408) a conflict that prevents the first publication attempt; making (block 412) a second modification within the first isolation context to resolve the conflict; and publishing (block 416) the first isolation context in connection with a subsequent publication attempt, including allowing the second isolation context to view a result of the first and second modifications, as of a time associated with the subsequent publication attempt.
  • a technique 430 includes executing (block 434) machine executable instructions in a processor-based machine in a first isolation context and in a second isolation context.
  • the first isolation context presents an associated first view of a data structure
  • the second isolation context presents and associated second view of the data structure.
  • Executing the machine executable instructions includes inhibiting (preventing, for example) modifications that are made to the first object in the first isolation context from being reflected in the second view.
  • a modification made in the first view "being reflected" in the second view refers to the modification being reproduced or shown in the second view.
  • an attempt is made (block 438) to publish the first isolation context to reflect modifications made to the data structure in the first isolation context in the second view of the data structure. If a determination is made (decision block 442) that one or more conflicts exist, then a decision is made
  • the technique 430 includes publishing (block 450) the first isolation context, including causing the second view to reflect modifications made to the data structure within the first isolation context. These modifications include modifications made during action(s) taken to resolve conflict(s), as of a time associated with the last attempted publication attempt.
  • properties of a data structure may be represented by multiple simultaneous value (MSV) objects.
  • MSV simultaneous value
  • an MSV object represents a property of a unit of data and may concurrently, or simultaneously, have multiple, alternative values.
  • a particular logical location such as a field of a record or a slot in an array, may be represented by an MSV object, such that the field or slot has different, alternative values, when seen through different views.
  • a given isolation context may be associated with one or multiple views that may be used when accessing one or multiple MSV objects. Moreover, multiple isolation contexts may be associated with multiple views of a given MSV object. The values associated with different views in a given MSV object may be isolated from each other. In this manner, as part of this isolation, a request to determine a value for the MSV object in a given view may result in the return of one of the MSV object's alternative values; and assigning the value of the MSV object in a given view may not affect the value of the MSV object in another view.
  • the system 100 uses various data structures 150 to manage the MSV objects.
  • a given data In accordance with example implementations, a given data
  • structure150 may be an object represented by a C++ class.
  • the data structures 150 may include isolation context objects 151 .
  • Each isolation context object 151 corresponds to one of the isolation contexts 1 10. It is noted that in the following discussion, "isolation context object 151 " and “isolation context 1 10" may be used interchangeably.
  • Conflicts 152 that prevent publication of the isolation context 1 10 may be installed on a corresponding isolation context state 157.
  • the data structures 150 may include view objects 153.
  • Each view object 153 may correspond to one of the views 120. It is noted that in the following discussion, “view object 153" and “view 120” may be used interchangeably.
  • the data structures 150 may include one or multiple conflict generators 160.
  • a conflict generator 160 may construct conflicts referring to a particular location.
  • Each subtype of conflict may have its own generator subtype, in
  • the data structures 150 may include context states 157.
  • a context state 157 represents the current set of conflicts and "contingent conflicts" for an associated isolation context 1 10, as well as a count or event time, representing the last time the isolation context associated with the conflict successfully published.
  • the data structures 150 may include an event counter 164, which may represent a monotonically-increasing set of values that are incremented by particular events.
  • the event counter 164 may represent a global, shared event count, which is atomically incremented.
  • the constant representing the greatest possible value of an event counter may be used to represent the most recent event count.
  • values of the event counter 164 may be associated with snapshot creation times, in accordance with example implementations.
  • Values read from the event counter 164 may be considered to be timestamps denoting points in type of the execution of system 100, and the value stored in the event counter 164 may be considered to be the current time (or timestamp) of the system 100 with respect to the context and MSV object management resources 130. It is noted that save that their values monotonically increase over time, meaning that timestamps may be compared with one another, these timestamps have no necessary relationship with any other notion of time (e.g. wall-clock time or time since the start of an operating system or process in system 100). It is also noted that different reads of the event counter 164 may correctly read the same timestamp value, but consecutive reads of the event counter 164 may never read an earlier timestamp after reading a later timestamp.
  • the data structures 150 may represent value chains 156, value objects 158 and MSV objects 166.
  • Each value chain 156 may be associated with a particular view object 153 (representing a view 120) that may be use when accessing an MSV object 166; and the value chain 156 represents a history of value objects 158 (for a particular MSV object 166 noted in the given view 120), starting from the most recently asserted and extending in a time ordered sequence back in time.
  • each value object 158 may contain a value (e.g., a primitive value or a reference to a data structure) for the MSV object 166 in the view 120 associated with the value chain 156 valid as of a particular timestamp, a timestamp representing that effective time and a link to a prior value object 158 (if any) for the same view 120.
  • a value e.g., a primitive value or a reference to a data structure
  • the data structures 150 may also include view relative pointers 154.
  • a view relative pointer 154 represents a reference, or pointer, to a structured object (a record, array or a map, as examples) and also specifies that read and write accesses to the structured object are to be seen through the perspective of an associated view 120.
  • the view relative pointer 154 may contain a pointer to an object and a pointer to a view object 153.
  • the value stored in a value object 158 may be a view relative pointer 154.
  • the view object 154 contained within a view relative pointer 154 stored in a value object 158 on a value chain 156 may be constrained to represent a view 120 associated with the same isolation context 1 10 that is associated with the value chain's 156 view 120.
  • a given MSV object 166 represents alternative values for a property of a data structure 106, as seen through different views 120.
  • the data structures 150 may also include MSV states 162.
  • Each MSV state 162 represents the current set of one or multiple value chains 156, which may be associated with an associated MSV object 166.
  • a given MSV object 166 may constrain the values associated with it to all be of a given type (e.g., all integers, all strings, or all lists of employee records).
  • a given MSV object 166 may permit values of different types to be associated with it.
  • the data structures 150 may also include MSV states 162.
  • Each MSV state 162 may represent a set of value chains 156 associated with an MSV object 166.
  • the data structures 150 may also include serializing tasks 170.
  • the tasks 170 may include write or publication tasks, used to effect either a modification of an MSV object 166 (for a write task) or an attempt to publish an isolation context object 151 (for a publication task). These tasks 170 may be used to ensure that operations that are supposed to be atomic are, in fact, atomic, as simultaneous execution might cause incorrect behavior. Rather than blocking, the serialization of the tasks 170 may allow threads to cooperate with each other, as further described herein.
  • the system 100 may use C++ atomic classes and associated functions, such as a C++ compare_exchange operation (also called a “compare and swap” operation or "CAS" operation).
  • C++ compare_exchange operation also called a "compare and swap” operation or "CAS” operation.
  • versioned pointers may be used, which encapsulate a value along with a version number, which is incremented every time the value is modified.
  • the version number may be represented by the use of sequences of bits (e.g., high-order bits) within the versioned pointer value.
  • Pointers, including versioned pointers, may encapsulate flags representing Boolean values (e.g., represented as bits within the pointer value).
  • Operations performed on a data structure may be considered 150 to be logically atomic even though the performance of the operations takes measurable time, during which other operations involving the same data structure 150 may be initiated in another thread.
  • a logical atomic operation that is associated with a finite processing time may be deemed to be correct when the result returned by the operation is one that would have been correct had the operation been instantaneous and executed at some arbitrary time in between the time the operation started and the time the operation finished and, further, when any modifications the made in the course of performing the operation may be logically ordered as a group with respect to those of other logically atomic operations. It is noted that this standard of correctness may be sufficient to guarantee that no sequence of operations performed in other threads is able to determine that the operation was not performed atomically.
  • each modifiable location may be associated with an MSV object 166; each MSV object 166 may have an associated mapping from view objects 153 to value chains 156 (e.g., in an associated MSV state 162); and each value chain 156 may have an associated timestamped list of value objects 158.
  • the mapping may distinguish between open view objects 153 and closed view objects 153. In accordance with example implementations, the mapping may distinguish between open and closed value chains 156 or views 120 associated with such view objects 153.
  • An open view object 153 is a view object 153 whose associated value chain 156 was last modified (e.g., by the addition of a value object 158) subsequent to the last time that the isolation context 1 10 associated with the view object 153 (representing a view 120) was published prior to the creation of the MSV state 162 (although the isolation context 1 10 may have been subsequently published).
  • a closed view object 153 is a view object 153 whose associated isolation context 1 10 is known to have been published since the last time its associated value chain 156 was modified.
  • open view objects 153 associated with the MSV object 166 whose isolation contexts 1 10 (as represented by isolation context objects 151 ) have been published may be closed before the read or modification occurs. More specifically, in accordance with example implementations, an open view object 153 may be closed by a close_published_views process. In this process, a new value object 158 may be added to the value chain 156 associated with view object 153 that is the parent of the view object being closed (e.g., as the head of the value chain 156).
  • the value in the new value object 158 may be the value from the most recent (e.g., first) value object 158 of the value chain 156 associated with the open view object 153, and the timestamp of the new value object 158 may be a timestamp associated with the successful publication of the view object's 153 isolation context 1 10.
  • any view objects 153 to be closed by the close published views operations e.g., if there are any view objects 153 whose associated isolation contexts 1 10 have been successfully published since the last time a value object 158 was added to the respective value chain 156
  • the close published views operation may be retried until a new MSV state 162 is successfully installed or a determination is made that there are no open view objects 158 in the MSV object's 166 current MSV state 162. In this manner, the value chains 156 are shared, and the threads cooperate with each other, as further described herein. All subsequent operations are done relative to the resulting MSV state 162, even if the MSV's 166 state 162 is changed again before the operation finishes.
  • read and modify operations provided by MSV objects 166 may take as a parameter a view object 153
  • the view object 153 may be that associated with the view relative reference 154.
  • the operations may additionally take as a parameter an isolation context object 151 representing a prevailing isolation context 1 10.
  • the isolation context object 151 may be requested to identify a shadow view object 153 for the provided view object 153, as described further below, and that shadow view object 153 may be used in place of the provided view object 153 in performing the operation.
  • Read operations find the value chain 156, if any, associated with the view object 153 in the MSV state 162 and return the value from the value object 158 in the value chain 156 where the timestamp of the value object 158 is no later than a specified timestamp and is the latest such timestamp among the value objects 158.
  • a "most recent" timestamp may be specified for the read operation, which indicates that the value object 158 with the most recent timestamp is to be used. If the timestamp is the "most recent" timestamp, the search for associated value chains 156 may be restricted to the open value chains 156 in the MSV state 162.
  • the process is repeated, with the view object 158 being replaced by that view object's 158 parent view object (if any).
  • the isolation context 1 10 associated with the replaced view object 153 is a snapshot isolation context 1 10
  • the timestamp is replaced by the snapshot time for the isolation context 1 10 (e.g., the timestamp associated with the last successful publication of the isolation context 1 10 or the creation of the isolation context). This may continue until a value object 158 is found and its value is returned or until a view object 153 is determined to not have a parent. In this case, a default value of the appropriate type may be returned.
  • a lazy publication process may be used to effect propagating changes made to an MSV object 166 in a given view 120 due to the publishing of an isolation context 1 10 associated with the view.
  • operations that are directed to effecting the publication do not occur until a subsequent operation directed to the MSV object.
  • a technique 700 includes providing (block 704) an object that has a plurality of alternative values; associating (block 708) a plurality of views with the plurality of alternative values; and associating (block 708) a plurality of computational contexts with the views.
  • the views are isolated (block 712) such that a request to determine a value in a given view results in a value of the plurality alternative values being returned, and a first value associated with a first view is independent from association of a second value a second view.
  • the technique 700 includes publishing (block 716) a computational context of the plurality of computational contexts to allow a value of the plurality of alternative values associated with a view of the plurality of views associated with the published context to be read in at least one other of the views; and in response to an operation directed to the object after the publishing, processing an effect of the publishing on the at least one other views, pursuant to block 720.
  • operations that modify values associated with views 120 an MSV object 166 may be serialized by means of write tasks associated with the MSV object 166. These write tasks are represented by corresponding write task objects 170 (also called "write tasks 170" herein). For example, such operations may include writing a value, incrementing or otherwise modifying a value, or resolving a conflict associated with a value. To effect this serialization, the MSV object 166 may have the capability to be associated with a single pending write task 170.
  • a thread of the system 100 may create a write task object 170 to store the data required to perform the requested modification and to discover any conflicts that result from the modification. This data may be stored in the write task object 170, such that values determined by one thread performing the write task for the MSV object 166 may be visible to other threads performing the write task for the same MSV object 166.
  • the write task object 170 may also contain information about steps in the process that have been completed by some thread. This may allow a given thread performing the write task to skip steps that have already been completed by another thread.
  • the thread performing the modification may attempt to change the pending write task associated with the MSV object 166 from a null value to the newly created write task object 170 (e.g., by means of a CAS operation). If this fails, it may indicate that another thread is processing a different write task directed to the MSV object 166 (i.e., may indicate that there is an ongoing modification of the MSV object 166 in association with another view object 153), and the present thread may perform that write task before repeating the attempt to install the newly created write task 170.
  • multiple threads may cooperate in processing a modification and a thread in one operating system process may complete a modification begun by a thread in a different operating system process that died before the modification was completed.
  • the thread may process the newly installed write task.
  • a thread of the system 100 may process a write task directed to an MSV object 166 as follows.
  • the "current write” refers equivalently to the write task object 170 being invoked, the performance of the steps below, which may take place concurrently in multiple threads, and the intended modification.
  • the write task may, in general, contain the following five steps:
  • the write task obtains the current MSV state 162 associated with the MSV object 166. It is noted that this MSV state 162 may be different from the one obtained as a result of calling a process called "close_published_values()," which is described further below. Due to the serialization of tasks, no further changes may be made by other tasks until the current write task finishes.
  • isolation contexts 1 10 that are associated with value chains 156 in the MSV state 162 are enumerated. For each isolation context 1 10 for which it is determined that it is possible for the current write to introduce a conflict or contingent conflict for the isolation context 1 10, the thread installs the current write task object 170 in the corresponding isolation context object 151 as a pre- publication task. This ensures that the isolation context 1 10 cannot be published until this write task 170 completes. Therefore publishing of the isolation context 1 10 does not occur until any conflicts 152 have been added to the isolation context object 151 . In this manner, if another thread (not involved with the current write task) attempts to publish one of these isolation contexts 1 10, this other thread first assists (via processing the write task 170) in deciding whether to add conflicts 152.
  • a contingent conflict refers to an indication that should one isolation context 1 10 successfully publish its changes, doing so will, as part of the atomic publishing process, induce a conflict in another isolation context 1 10.
  • the thread processing the write task determines the value to be associated as the result of the modification, and the write task may create a value object 158 and add the created value object 158 to the correct value chain 156.
  • the thread processing the write task next identifies any conflicts 152 or contingent conflicts for each value chain 156 in the MSV state 162 and adds these conflicts 152 and contingent conflicts to the associated isolation contexts
  • the system 100 may determine whether the current write implies a conflict for the value chain's associated isolation context 1 10. For example, when the value chain 156 is open, the associated isolation context 1 10 is a publishable isolation context, and the associated view 120 is a descendent of the view being modified, the thread may determine that a conflict should be added to the isolation context object 151 associated with the value chain 156.
  • the isolation context 1 10 associated with the view 120 being modified is a publishable snapshot
  • the view 120 associated with the value chain 156 under consideration is an ancestor of the view being modified
  • the timestamp of the most recent value object 158 on the value chain 156 under consideration is more recent than the snapshot time of the snapshot
  • the thread processing the write task may determine that a conflict should be added to the isolation context object 151 associated with view 120 being modified.
  • the thread processing the write task may determine that contingent conflicts should be added to one or both of the isolation contexts objects 151 associated with the views 120 such that when one of the corresponding isolation contexts 1 10 publishes the publication adds a conflict at the MSV object 166 to the other isolation context object 151 .
  • the thread processing the write task may remove the current write task as a pre-publication task from the isolation context objects 151 that it was added to, thereby allowing any publication(s) to proceed.
  • this step may be performed as part of the prior step following the determination of any conflicts on all of the isolation context's 1 10 associated value chains 156 or following the addition of a conflict to the isolation context.
  • the thread may attempt to remove the write task 170 as the pending write task 170 for the object, e.g., by using a CAS operation to replace the current write task 170 with a null value.
  • the thread may make a single attempt, as failure of the CAS operation may indicate that another thread was previously successful in this removal.
  • a thread may perform a check to determine whether there is already a value object 158 established on an open value chain 156 associated with the operation's view 120 since the last time the associated isolation context 1 10 was published (or at all if the isolation context 1 10 has yet to publish). If such a value object 158 exists, the value associated with the value object 158 may be returned. Otherwise, the thread may initiate a modify operation, where the value to be asserted by the modify operation may be the current value associated with the view, and this value may be returned by the frozen read operation.
  • the modify operation may indicate in the value chain 156 that the value added is as the result of a frozen read and therefore, is not propagated to a value chain 156 associated with the view's 120 parent view 120 following successful publishing of the isolation context 1 10.
  • the publication process may be serialized with respect to other publication tasks or other write tasks installed as pre- publication tasks (as described above) by concurrently executing threads of the system 100: a thread may install a publication task in the isolation context object 151 associated with the isolation context 1 10 to be published after cooperating in finishing any currently installed publication task or write tasks.
  • a conflict or contingent conflict may not be added to an isolation context object 151 by a write task 170 unless that write task 170 has been installed as a pre-publication task 170 on the isolation context object 151 ; and a write task 170 may not be installed as a pre-publication task 170 on an isolation context object 151 that has an installed publication task 170 before the publication task is finished with the cooperation of the thread attempting to install the write task 170.
  • no further conflicts may be added until after the publication attempt associated with the publication task finishes.
  • the system 100 installs any conflicts 152 for an associated isolation context 1 10 in the corresponding isolation context state 157. In this manner, if a given isolation context state 157 has actual conflicts 152 (i.e., non-contingent conflicts), then any attempted publication of the associated isolation context 1 10 fails, and the conflicts 152 are noted and may be addressed, as further described herein.
  • actual conflicts 152 i.e., non-contingent conflicts
  • isolation context 1 10 attempts to publish and has no actual conflicts 152 installed, the isolation context 1 10 is allowed to publish; any contingent conflict(s) 152 are identified and added as actual conflict(s) 152 to the isolation context object(s) 151 associated with the contingent conflict(s) 152; and the isolation context object 151 being published is updated to be associated with a new isolation context state 157 having no (actual or contingent conflicts), an updated last publication time, and its prior isolation context state 157 as a prior state.
  • the determination that there are no actual conflicts 152 and the updating of the isolation context state 157 may be constrained to constitute a single logically atomic operation.
  • the isolation context object 151 may include one or multiple of the following features.
  • the isolation context object 151 may include an immutable reference to a parent isolation context object 151 .
  • the isolation context object 151 may include an atomically-updated reference to a current isolation context state 157. This reference may be atomically changed to refer to a new isolation context state 157 whenever the associated isolation context 1 10 publishes or when a conflict or contingent conflict is created for the context 1 10.
  • the isolation context state 157 may contain one or multiple conflicts 152. More specifically, in accordance with example implementations, the isolation context state 157 may include a list of conflicts representing actual conflicts; a list of conflicts representing contingent conflicts; a timestamp representing the last publication time; and a reference to the prior state of the associated isolation context 1 10 as of the last successful publication of the isolation context 1 10 (if any).
  • an associated isolation context state 157 may be created.
  • the timestamp associated with a created isolation context state 157 may be the result of atomically implementing the event counter 164.
  • the lists of conflicts and contingent conflicts may be linked lists such that creating a new isolation context state 157 based on an existing isolation context state 157 differing only in the addition of or removal of conflicts 152 on one of the conflict lists may involve isolation context states 157 whose lists share a state in a common suffix.
  • the timestamps of the isolation context states 157 associated with an isolation context object 151 represent the creation time of the isolation context object 151 and its successful publications. These timestamps may be called “stable times" for the isolation context object 151 and its associated isolation context 1 10. The latest of these stable times (e.g., the timestamp of the isolation context state 157 directly associated with the isolation context object 151 ) may be called the "last stable time" for the isolation context object 151 .
  • the isolation context object 151 may have an atomically-updated reference to a pre-publication task that may be a write task or a publication task that is to be completed before publication of the associated isolation context 1 10 may be attempted.
  • the pre-publication task may be a single publication task or a collection of write tasks, all of which are to be completed before publication of the associated isolation context 1 10 may be attempted.
  • the completion of the write tasks from the collection of write tasks may include basing whether to perform the write task on an indication of whether the task has been completed (e.g., by another thread).
  • the isolation context object 151 may have an associated map, also called a "shadow map," which maps view objects 153 to other view objects 153.
  • the shadow map may be a lock-free map.
  • the shadow map may be a lock-free cuckoo map.
  • the isolation context object 151 may have an associated immutable enumerated value that represents whether the associated isolation context 1 10 is a live or snapshot isolation context 1 10.
  • the isolation context object 151 may have an immutable, enumerated value that represents the associated isolation context's modification type, which may be a publishable isolation context 1 10, a detached isolation context 1 10 or a read-only isolation context 1 10.
  • a publishable isolation context 1 10 refers to an isolation context 1 10 that may be published.
  • a detached isolation context 1 10 refers to an isolation context 1 10 that is constrained to be one that is not published, but values in the detached isolation context 1 10 may be modified.
  • a read-only isolation context 1 10 refers to an isolation context 1 10 that is constrained to not be published, and values in the read-only context 1 10 are constrained to not be modified.
  • the system 100 may be constrained to not create a publishable isolation context 1 10 whose parent is a read-only isolation context 1 10, as publishing modifications from the child isolation context 1 10 would constitute an impermissible modification to the read-only parent isolation context 1 10.
  • the data structures 150 include a global isolation context object 156 representing the global isolation context 1 10 and having no parent isolation context object 156.
  • the isolation context object 151 may also be associated with one or multiple of the following operations.
  • the isolation context object 151 may provide a shadow(view object) operation to identify a view object 153 associated with the isolation context object 151 and related to the specified view object 153. If the shadow(view object) operation determines that the isolation context object 151 is associated with the view object 153, the view object 153 is returned as the value of the shadowQ operation.
  • the shadow(view object) operation involves checking the shadow map associated with the isolation context object 151 . If there is no entry in the shadow map corresponding to the view object 153, the shadow(view object) operation may include invoking the shadow(view object) operation on the parent isolation context object 151 . If the shadow map contains a view object 153 corresponding to the resulting view object 153, the corresponding view object may be returned. Otherwise, the shadow(view object) operation may create a new view object 153, as a child of the parent's shadow view object 153. The shadow(view object) operation may associate the new view object 153 with the specified view object 153 and with the parent's shadow view object 153. In this way, a hierarchy of shadow view objects 153 may be created and efficiently retrieved.
  • the isolation context object 151 may provide a new_child(view type, modification type, timestamp) operation to create a child isolation context object 151 of the isolation context object 151 , where the timestamp associated with the child's isolation context state 157 is the specified timestamp or, if the timestamp is omitted or "most recent" is specified, the result of incrementing the event counter 164, and where the child has the specified view type (e.g., either "live” or "snapshot") and the specified modification type (e.g., "publishable”, "detached", or "read-only”).
  • the specified view type e.g., either "live” or "snapshot
  • the specified modification type e.g., "publishable”, "detached", or "read-only”
  • the isolation context object 151 may provide a publishQ operation to install and run a publish task, as described below, after first collaborating in finishing any publish or write tasks already installed in the context's pre-publication task.
  • the isolation context object 151 may provide an add_conflict(conflict) operation to first check whether the conflict 152 has been marked as being
  • the operation atomically replaces the current isolation context state 157 with a new isolation context state 157 identical to the current save that the conflict 152 is prepended to the conflict list.
  • the replacement may be performed atomically by a CAS operation, looping until an attempt succeeds and creating a new isolation context state 157 each time.
  • the operation attempts to install the conflict 152 in a location (e.g., a value chain 156) associated with the conflict 152 to assert that there is known to be a conflict 152 at that location.
  • the attempt to install may be made by a single invocation of a CAS operation attempting to replace a null value with the present conflict 152. Failure in this attempt implies that another conflict 152 has been installed there (e.g., by another thread), and accordingly, the present conflict 152 is marked as being resolved.
  • the isolation context object 151 may provide an
  • add_contingent_conflict(conflict) operation which atomically replaces the current isolation context state 157 with a new isolation context state 157 identical to the current save that the conflict 152 is prepended to the contingent conflict list.
  • the replacement may be performed atomically by a CAS operation, looping until an attempt succeeds and creating a new isolation context state 157 each time.
  • the isolation context object 151 may provide a conflict_resolved (conflict) operation to remove a conflict 152 from preventing publication of the isolation context 1 10 associated with the isolation context object 152. If the specified conflict 152 is the head of the current isolation context state's 157 conflict list, the operation replaces the associated isolation context state 157 with a new isolation context state 157 equivalent to the old one, save that all initial conflicts 152 in the conflict list that are marked as resolved are removed. In this way, in accordance with example implementations, resolved conflicts 152 may remain on the conflict list, but after all conflicts 152 have been marked as being resolved, the conflict list in the isolation context state 157 is empty.
  • conflict_resolved (conflict) operation to remove a conflict 152 from preventing publication of the isolation context 1 10 associated with the isolation context object 152. If the specified conflict 152 is the head of the current isolation context state's 157 conflict list, the operation replaces the associated isolation context state 157 with a new isolation context state 157 equivalent to the old one, save that
  • the isolation context object 151 may provide a
  • the isolation context object 151 may provide a publication_after(timestamp) operation.
  • the operation traverses the list of the isolation context states 157, starting from the current isolation context state 157 and continuing by following the prior state pointer; and returns the earliest timestamp associated with a state 157, such that the timestamp of the state 157 is after the specified one. This operation may also indicate whether there is no such timestamp.
  • the isolation context object 151 may provide a publication_before(timestamp) operation. The operation traverses the list of the isolation context states 157, which may be similar to the publication_after(timestamp) operation.
  • the publication_before(timestamp) operation returns the latest publication timestamp before the given one, or a zero timestamp (or a similar timestamp that is considered to be before any other timestamp), if the given timestamp is before the associated isolation context's creation time.
  • the contingent conflict list of the isolation context state 157 may have nodes, where each node indicates a contingent conflict and has an associated "handled?" flag.
  • the contingent conflict associated with a node may only be added as a conflict to its associated isolation context object 151 following a determination that the flag is not set; and the flag may be atomically set following the successful addition of the conflict 152.
  • the view object 153 may have one or multiple of the following properties.
  • the view object 153 may contain an unchangeable, or immutable, reference to the isolation context 1 10 (as represented by object 151 ) that created it; and the view object 153 may contain an immutable reference to its parent view object 153 (which may or may not be associated with the same isolation context 1 10).
  • the view object 153 may have an ancestor cache, which may be a map from view objects 153 to Boolean values, where an association in the map indicates a determination that a given other view object 153 has been determined to be or to not be an ancestor of the current view object 153.
  • a true Boolean value, a false Boolean value and no entry represent whether the ancestry has been determined to be an ancestor, has been determined to not be an ancestor or is to be determined, respectively.
  • the ancestor cache is not constructed until a determination is required and is atomically installed.
  • the ancestor cache is a lock free map (e.g., a lock free cuckoo map).
  • the data structures 150 include a top-level view object 153, which is associated with the global isolation context object 151 and which has no parent view object 153.
  • the view object 153 may provide an operation that is directed to retrieving a reference to the associated isolation context object 151 .
  • the view object 153 may provide a has_ancestor(view object) operation to determine whether a given view object 153 is an ancestor of the view object 153.
  • the operation includes first determining whether the specified view object 153 is the same as the view object 153, the same as the parent of the view object 153, or the top-level view object 153. In any of these cases, the operation may return a true Boolean value. Otherwise, if the view object 153 is the top-level view object 153, then the operation may return a false Boolean value.
  • the view object 153 does not contain an ancestor cache, the operation creates one and atomically associates it with the view object 153.
  • the ancestor cache may be examined to determine whether the answer is known. If a Boolean value is found to be associated in the ancestor cache with the specified view object 153, the Boolean value may be returned. Otherwise, the ancestry of the view object 153 may be walked, or traversed, starting with its parent view object 153 and continuing through successive parent view objects 153 until the specified view object 153 or the top-level view object 153 is encountered. The value of the operation depends on the view object 153 encountered during the traversal of the ancestry: a Boolean true value if the specified view object 153 is encountered and a Boolean false value if the top-level view object 153 is encountered. The value of the operation may be associated with the specified view object 153 in the ancestor cache and returned.
  • the conflict 152 may have one or multiple of the following properties.
  • a conflict 152 may contain an atomically-updated flag to indicate, or represent, whether or not the conflict 152 has been resolved.
  • a conflict 152 may contain an immutable reference to an atomically- updated reference to a conflict 152, which is the location in which this conflict is installed.
  • the conflict 152 may have one or multiple subclasses holding information identifying different types of locations at which conflicts may occur.
  • such subclasses may include: 1 .) a field conflict, which holds a reference to a record object and an indication of a field within that record object; 2.) a bound name conflict, which holds a reference to a
  • the MSV object 166 may have one or multiple of the following properties.
  • An MSV object 166 may contain an atomically-updated reference to its current state 162.
  • An MSV object 166 may contain an atomically-updated reference to a pending write task, which is a write task is in progress and must be completed before another modification can be performed.
  • An MSV object 166 may contain an immutable reference to a conflict generator 160.
  • the type of the conflict generator 160 referred to may depend on the type of location the MSV object 166 represents (e.g., field of record or slot of array).
  • Conflict generators 160 associated with different types of location may generate different instances of different subtypes of conflict 152.
  • the MSV object 166 may be an instance of a template (or parameterized or generic) class, where the template parameter may indicate the type of data contained (e.g., the values contained in value objects 158 and provided to and returned by operations).
  • This template parameter may also be used by the MSV state 162, value objects 158, and value chains 156, which may be nested within the class implementing the MSV object 166.
  • Such an arrangement may allow different representations of value objects 158 holding different types of values, which may permit more efficient representation when values are of a primitive type (e.g., numbers, Boolean values, or characters).
  • the MSV object 166 may provide one or multiple of the following operations.
  • the MSV object 166 may provide a read(view, timestamp) operation to determine and return the value of the MSV object 166 for a specified view object 153 as of (i.e., no later than) the given timestamp, which, if omitted defaults to the "most recent" timestamp, indicating that the most recent value should be returned.
  • a process of reading from an MSV object 166 includes invoking the current_value operation on the MSV state 162 associated with the MSV object 166.
  • the MSV object 166 may provide a read_frozen(view) operation to determine and return the current (e.g., most recent) value of the MSV object 166 for the given view object 153.
  • the read_frozen(view operation) ensures that
  • the MSV object 166 may provide a has_value(view, timestamp) operation to indicate (e.g., via returning a Boolean value) if a read operation with the same parameters would return a value discovered in a value object 158.
  • the MSV object 166 may provide a modify(view, op, resolve?, argument) operation to modify the value in the given view object 153, according to the given operation applied to the current value object 158 and (when applicable) the given argument.
  • the "?” suffix denotes a Boolean value.
  • the operations may include one or multiple of the following: an operation to set the value to the argument; an operation to add, subtract, multiply, or divide the current value by the argument; an operation to clear the value (e.g., set the value to a default value); and an operation set the value to a value present in the MSV object 166, where the present value may be one of the current value in the given view 120; the value in the parent view 120 of the given view object 153, the last stable value (e.g., the value as of the last stable time for the associated isolation context 1 10); and the current value, with an indication that this should be noted as implementing a frozen read.
  • the "resolve?" argument controls whether this modification should be considered to resolve any conflict 152 for this view object 153 in this MSV object 166.
  • the MSV object 166 may provide a write(view, resolving, new value) operation, which is an alias for the above-described operation with the operation that of setting the value to the given argument.
  • operations that are provided by the MSV object 166 may be preceded by closing the published view objects 153 in a close_published_views() process, which is described below.
  • the value chain 156 may have one or multiple of the following properties.
  • the value chain 156 may represent the history of value objects 158, which are associated with a view object 153.
  • the value chain 156 may contain immutable references to the value chain's view object 153.
  • the value chain's view object 153 may contain immutable references to the value chain's view object 153.
  • references may be a pointer that contains flags that record whether the view object's isolation context 1 10 is mutable and/or a snapshot.
  • the value chain 156 may contain a pointer to the latest, or most recent, value object 158, and in accordance with example implementations, the pointer may contain a flag that indicates or represents whether the most recent value object 158 was due to a frozen read or equivalent operation (e.g., an operation that asserts a value equivalent to the current value, such as an operation of adding or subtracting zero or an operation of multiplying or dividing by one).
  • view-relative pointers may be installed on the value chain 156. In this manner, the view-relative pointers may be associated as values in the value objects 158 that are installed on the value chain 156.
  • the view-relative pointers may be constrained to be relative to some view object 153 that shares an isolation context 1 10 with the view object 153 of the value chain 156.
  • the value chain 156 may contain an atomically-updated reference to a conflict 152 that is associated with the value chain 156. This may be a null reference to indicate a lack of a conflict.
  • the value chain 156 may provide one or multiple of the following operations.
  • the value chain 156 may provide a value(timestamp) operation to return a reference to the value object 158 representing the latest value object 158 in the value chain 156 before the given timestamp, or provide a null reference, if no such value object 158 exists.
  • the operation may involve traversing the value chain 156 by starting at the latest value object 158 associated with the 158 and following each value object's link to the prior value objects 158.
  • the value chain 156 may provide an add_value(value) operation to update the most recent value object 158 reference to a new value object 158 with the provided value.
  • the timestamp of the new value object 158 may be the current value of the event counter 164, and the prior pointer of the new value object 156 may refer to the old most recent value object 158.
  • the add_value(value) operation may return the newly added value object 158.
  • the prior pointer for the new value object 158 may be set to the prior pointer of the old most recent value object 158, rather than the prior pointer of the most recent value object 158, for the case in which the timestamp of the old most recent value object 158 is the current value of the event counter 164.
  • This allows the old most recent value object 158 to be garbage collected (assuming there are no other references to it). That is, in accordance with example implennentations, when the event counter 164 has not changed in between successive additions to a value chain 156, the new value object 158 replaces the old most recent value object 158, rather than extending the chain of value objects 158.
  • the most recent value object pointer contains a read-in-snapshot flag, indicating whether the value from the most recent value object 158 in the value chain 156 was read after following the parent link of a view object 153 associated with a snapshot isolation context 1 10 (e.g., when the timestamp of the read is a stable time for a snapshot rather than the "most recent" timestamp).
  • the reference to the most recent value object 158 (including its read-in-snapshot flag) is remembered. If the value object 158 found is the most recent value object 158, a single attempt is made to replace the remembered value object reference with an identical one that has the read-in- snapshot flag set to true.
  • this attempt fails, it means that either some other thread succeeded (i.e., a failure occurred due to the assumption that the false value for the read-in-snapshot flag was wrong) or another value object 158 was added to the value chain 156 (i.e., a failure occurred because the most recent value object 158 reference had changed). Either cause of failure removes the problem.
  • the old most recent value object reference may be read before the current timestamp is read.
  • the current timestamp is incremented upon the creation of a snapshot, it may be inferred that either the read-in-snapshot flag of the read most recent value object reference is false or the current timestamp was incremented since the last value object 158 (e.g., the current most recent value object 158) was added to the value chain 156.
  • the new value for the most recent value object reference has the read-in-snapshot flag set to false. If it is the case that the read-in- snapshot flag was set in between the time the read of the most recent value object reference was made and the time of the attempted update, then the CAS operation to change the most recent value object reference from the read value to the new value fails. In this case, another attempt is made to add the value, which involves reading a new current timestamp and updating the timestamp and next pointer on the value object 158 being added.
  • the MSV state 162 indicates states for value chains 156 of an associated MSV.
  • the MSV state 162 contains an array of value chains 156 (by reference).
  • Each value chain 156 may be considered “publishable” or “unpublishable” based on its associated isolation context.
  • Each value chain 156 may further be considered to be "open” or "closed”.
  • An open value chain 156 is one on which a value object 158 was added subsequent to the last stable time of the associated isolation context 1 10 as of the time the MSV state 162 was created.
  • a closed value chain 156 is one that is not an open value chain 156.
  • a closed value chain 156 is publishable, and an unpublishable value chain 156 is open.
  • the value chains 156 may be arranged in the array such that all open value chains 156 preceded all closed value chains 156 and all unpublishable value chains 156 precede all publishable value chains.
  • the first entry in the array may be reserved for a value chain 156 that is associated with the top-level view object 153.
  • This value chain 156 is unpublishable (since the top-level view object 153, which lacks a parent is unpublishable) and as such, may be considered open even if it has no value objects 158.
  • the MSV state 162 may contain an indication of the number of contained unpublishable value chains 156, the number of open publishable value chains 156, and the number of closed value chains 156. These numbers allow the determination of the state of a value chain 156 based on its position within the array.
  • the MSV state 162 may provide one or multiple of the following operations.
  • the MSV state 162 may provide the close_published_views() operation, which is further described below.
  • the MSV state 162 may provide a values(view, open only?) operation to return the value chain 156 that is associated with the specified view object 153, if it exists. If the "open only?" parameter is a Boolean true value, then a value chain 156 is not returned unless the value chain 156 is found among the open (e.g., unpublishable and open publishable) value chains 156.
  • the MSV state 162 may provide a current_value(view, timestamp) operation to return the value associated with the view object 153 as of the specified timestamp.
  • this operation may traverse parent view objects 153 and the timestamp may be modified when traversing parent view objects 153 of view objects 153 associated with snapshot isolation contexts 1 10.
  • the operation may begin at the specified view object 153 and may call the values(view, open only?) operation to get the value chain 156 associated with the view object 153.
  • the "open only?” parameter to the operation may be the Boolean true value just in case the specified timestamp is the "most recent" timestamp.
  • the current_value operation may call value(timestamp) on this result to get a value object 158. If none is found, the operation may replace the current view object 153 by its parent view object 153. If the current view object 153 is associated with a snapshot isolation context 1 10, the operation may replace the timestamp with the last stable time of the isolation context 1 10 prior to the timestamp by calling the
  • the search may be performed again, and this process may continue until a value object 158 is found or the parentage is exhausted.
  • a value object 158 is found, it is returned, otherwise a null pointer is returned.
  • a value chain 156 associated with a parent view object 153 that is associated with an isolation context 1 10 that is not read only
  • the timestamp is the "most recent timestamp”
  • the MSV state 162 has been replaced in the MSV object 166 with a new one that has a value chain 156 for that view object 153; and the modification that added the value chain 158 took place before the value object 158 found was added.
  • value chain 156 contains the value object 158 that should have been returned by the operation.
  • the current_value(view, timestamp) operation returns (as a secondary return value) an indication that the value 153 returned should not be trusted if the MSV's state has changed.
  • the caller attempts to confirm that the MSV state 162 is still associated with the MSV object 166. If this is not the case, the caller may call current_value() on the new state. This process may repeat until the secondary return indication is not seen or the MSV object's 166 current state has not changed.
  • an MSV object 166 may provide a
  • close_published_views() operation to update the MSV object 166 to be associated with an MSV state 162 that reflects all publications of isolation contexts 1 10 that have happened since the last time close_published_views() was called.
  • the close_published_views() operation may accept as a parameter a view object 153 (called the "write view” below) indicative of a view 120, if any, that the caller intends to assert a value in and which, therefore is associated with an open value chain 156 in the resulting MSV state 162.
  • the operation may return the following values: 1 . the resulting MSV state 162, which if not null, has been installed in the MSV object 166; 2. the value chain 156 associated with the write view (if specified) in the resulting MSV state 162; and 3. an indication of whether the returned value chain 156 (if any) was an open value chain 156 prior to the operation.
  • a thread executing in the system 100 may perform the following actions when executing the
  • the thread reads the current state of the MSV state 162 (called the
  • the thread If there was no remembered state (e.g., if the current state of the MSV object 166 was a null reference), then if there is no write view object 153, the thread returns a null MSV state 162 reference. Otherwise, the thread creates a new value chain 156 associated with the write view. If the write view is not the top- level view object 153, the thread may also create a value chain 156 associated with the top-level view object 153. The thread then creates a new MSV state associated with an array containing the created value chains 156 and recording the number of unpublishable and open publishable value chains 156 this represents. The thread returns the created MSV state 162, the value chain 156 associated with the write view, and an indication that the returned value chain 156 was not open.
  • the thread calls a find_need_close() operation on the MSV state 162, which returns a priority queue indicating open publishable value chains 156 whose associated with isolation context objects 151 that have been published more recently than the timestamp of the most recent value object 158 on the value chain 156.
  • the priority queue may be ordered from least-recently published to most-recently published, and the elements may include the publication time following the most recent value object's 158 timestamp and an index in the value chain 156 array of the MSV state 162. It is noted that the timestamp stored in the priority queue may not be reflect the most recent publication of the isolation context object 151 . In searching for value chains 156 to add to the priority queue, if the thread discovers a value chain 156 associated with the write view 153, it may remember it for later use.
  • the thread may return from the operation: the current MSV state 162, the write value chain 156 (if any), and, if a write value chain 156 was found, an indication that the write value chain 156 was previously open.
  • a write view object 153 was specified and no associated value chain 156 was discovered, it may be inferred that no write value chain exists among the open publishable value chains 156. If the write view object 153 is associated with an unpublishable isolation context 1 10, a search may be made through the unpublishable value chains 156 associated with the MSV state 162. As an optimization, if the write view object 153 is the top-level view object 153, an associated value chain 156 may be found as the first value chain 156 among the unpublishable value chains 156. If an associated value chain 156 is found, it may be returned, along with the current MSV state and an indication that the value chain 156 was previously open.
  • a new MSV state 162 may be made that is a copy of the remembered state with the addition of a new unpublishable value chain 156 associated with the write view 153. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the new value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
  • a search may be made through the closed value chains 156 associated with the MSV state 162. If an associated value chain 156 is found, a new MSV state 162 may be made that is a copy of the remembered state, save that the found value chain 156 is moved to become an open publishable value chain 156. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the found value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
  • a new MSV state 162 may be made that is a copy of the remembered state with the addition of a new open publishable value chain 156 associated with the write view 153. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the new value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
  • process_close() may be called on the current MSV state 162, passing in the priority queue and the write view object 153. The operation returns the same three values, which are remembered.
  • the process_close() operation has access to the priority queue and the write view object 153 (if any) of its caller, and in accordance with example implementations, maintains the following structures: a vector of open publishable value chains 156; a vector of closed value chains 156; and a vector of new unpublishable value chains 156. It is noted that the designation of these value chains 156 as, e.g., open publishable, represent the point of view of an MSV state 162 being designed and the designation for a given value chain 156 may change during the operation.
  • the vector of open publishable value chains 156 may initially contain the open publishable value chains 156 in the MSV state 162. Whenever a new open publishable value chain 156 is created or a closed value chain 156 is reopened, the value chain 156 is added to the end of this vector. Whenever a value chain 156 is closed, the slot in the vector is replaced by a null pointer. A side count may be maintained, representing the number of non-null entries in the vector.
  • the vector of closed value chains 156 may initially contain the closed value chains 156 in the MSV state 162. In accordance with an implementation, the creation of this vector may be deferred until there is a need to alter its contents.
  • the vector of new unpublishable value chains 156 may initially be empty. When a new unpublishable value chain 156 is created, the value chain 156 is added to the end of the vector.
  • the unpublishable value chains 156 of the MSV state 162 being designed comprise the unpublishable value chains 156 of the current MSV state 162 and the contents of this vector.
  • the process_close() operation may repeatedly remove items from the priority queue and identify value chains 156 and publication timestamps, where the values removed reflect the earliest such timestamps in the priority queue, stopping when the priority queue is empty.
  • Each such value chain 156 (which may be called a child value chain) is in the open publishable value chain 156 vector.
  • the operation next identifies the parent value chain 156 associated with the parent view object 153 of the view object 153 associated with the child value chain 156. If the parent value chain 156 is identified in the open publishable value chain 156 vector, the unpublishable value chain 156 vector, or the unpublishable value chains 156 of the current MSV state 162, it is noted.
  • a new parent value chain 156 is created and add it to the unpublishable value chain 156 vector. If the parent view object 153 is associated with a publishable isolation context 1 10, a search is made through the closed value chain 156 vector (or, if this has not yet been created, the closed value chains 156 associated with the current MSV state 162). If a parent value chain 156 is found, the parent value chain 156 is reopened by removing it from the closed value chain 156 vector and adding it to the open publishable value chain 156 vector. If a parent value chain 156 is not found, a new parent value chain 156 is created and added to the open publishable value chain 156 vector.
  • the child value chain 156 may then be closed, and modifications (if any) of the child value chain 156 are transferred to its parent.
  • the child value chain 156 may be removed from the open publishable value chain 156 vector. If there is an indication in the child value chain 156 that the most recent value object 158 was added as the result of a frozen read operation, there is no value that needs to be transferred to the parent value chain.
  • the most recent value object 158 associated with the child value chain 158 may be removed from the child value chain 156 along with the indication. If this results in a child value chain 156 with no associated most recent value object 158, the child value chain 156 may be discarded.
  • the most recent parent value object 158 (if any) is discovered on the parent value chain 158. If the most recent parent value object 158 exists and the associated timestamp is greater than or equal to one less than the timestamp retrieved from the priority queue, this may indicate that another thread has already processed the publication of the child view object 153, so this thread need not.
  • a new parent value object 158 is created based on the most recent child value object 158, having a timestamp equal to the retrieved timestamp minus one and the same value unless the value is a view-relative pointer 154, in which case the value is a copy where the view object 153 associated with the copy is the parent of the view object 153 associated with the original.
  • the prior value object 158 of the new parent value object 158 is the old most recent parent value object 158, if any.
  • the operation then makes a single attempt to atomically replace the old most recent parent value object 158 with the new parent value object 158 in the parent value chain 156. A failure in this attempt may indicated that another thread has already processed the publication of the child view object 153.
  • the parent view object 153 is associated with a publishable isolation context 1 10, a determination is made as to whether the parent isolation context 1 10 was published following the retrieved timestamp. If it was, a new entry is added to the priority queue reflecting the parent value chain 156 and the earliest publication time of the parent isolation context 1 10 following the retrieved timestamp. It is noted that this step is performed even if an earlier determination was made that another thread installed a new parent value object 158 reflecting the current child value object 158.
  • the thread may proceed to identify the write value chain 156 associated with the write view object 153, if any. This may be performed by the same procedure as was used to identify each parent value chain 156.
  • the thread may create a new MSV state 162 associated with the unpublishable value chains 156 from the current MSV state 162, the unpublishable value chains 156 from the unpublishable value chain 156 vector, the open publishable value chains 156 from the open publishable value chain 156 vector, and the closed value chains 156 from the close value chain 156 vector (or if this last has not been created, from the current MSV state 162).
  • the thread may then return this new MSV state 162, the write value chain 156 (if any) and an indication of whether the write value chain 156 was found among the open value chains 156.
  • Cooperative tasks are described herein for purpose serializing operations, such as publication and adding conflicts to an isolation context or modifying a value.
  • Each point of serialization may be associated with a "task holder.”
  • the task holder refers to a task and may be atomically updated.
  • the task is first installed in the holder. This installation may involve attempting to replace a null pointer in the task holder with a reference to the task by means of the CAS operation. If the task holder was not empty, then this attempt fails, and the current value is obtained. If the blocking task is not the one being installed (i.e., if it was not the case that another thread was successful in installing it), then the task is run in the current thread, and an attempt is made to replace the task with null. If this fails, it means that another thread removed it first, so another iteration occurs to try the install again. When the task is successfully installed, the task is then run and then an attempt is made to remove it (replace it with null).
  • the work of the task may occur in the task's run() operation, which may be defined in a subclass.
  • Publish tasks may be installed in the isolation context objects 151 to (attempt to) perform a publication of the corresponding isolation context 1 10.
  • the publish tasks are in competition for this task holder with other publish tasks and with write tasks that may wish to add conflicts to this context.
  • the publish task may have one or multiple of the following properties.
  • the publish task may have an immutable reference to the isolation context object 151 being published.
  • the publish task may have an atomically-updated timestamp representing the publish time (initially zero).
  • the publish task may have an atomically-updated reference to a list of conflicts seen by the publish task. This atomically-updated reference may contain a "has value" flag, initially false, to be able to distinguish between the scenario in which it is unknown whether there are any conflicts 152 and the scenario in which it has been determined that there are no conflicts 152.
  • the publish task has finished, and the conflicts reference either contains a list of blocking conflicts or is null, indicating that the publish succeeded as of the publish time.
  • the publish task is complete.
  • isolation context object's 151 associated isolation context state 157 is requested to attempt to publish as of the determined publish time.
  • This may request may return a list of conflicts 152, which may be an empty list, indicating successful publication. This list may be installed in the task via the conflicts reference with the has-value flag set. The publish task is complete.
  • an isolation context state 157 associated with an isolation context object 151 may attempt to publish as of a specified publication timestamp as follows:
  • the corresponding contingent conflict list is traversed, adding each contingent conflict 152 to its associated isolation context object 151 .
  • the nodes in the list may contain a "handled?” flag, which is checked before adding and set afterwards, to minimize duplicate work as threads run through the list at the same time.
  • a count of the contingent conflicts 152 handled is maintained, and a handling flag is used in addition to the "handled?" flag.
  • a handling flag is used in addition to the "handled?" flag.
  • contingent conflicts 152 having states that may be changed from unhandled to handling are added, the state then being changed unconditionally to handled, and a counter may then incremented.
  • the count did not equal the number of items of the entire list, this means that some thread claimed an entry and then terminated or paused; and when this occurs, a second pass may be performed, adding all contingent conflicts who are in handling and incrementing the counter for each of these conflicts if its state may be changed from handling to handled.
  • a new isolation context state 157 is constructed with the given publication timestamp and whose prior state is the current isolation context state 157.
  • the failed attempt identifies the isolation context state 157 associated with the isolation context object 151 , and the operation may return the result of asking this isolation context state 157 to attempt to publish as of the specified timestamp.
  • a publish result object is created based on the publish time and conflict list stored in the task.
  • the creation of the publish result object may be performed outside of the publish task.
  • the publish result object contains an immutable reference to the isolation context object 151 that was published; an immutable timestamp representing the publish attempt time; and an immutable list of blocking conflicts 152, which is empty if the publish attempt succeeded.
  • a write task is installed to make a modification to an MSV object 166. It is noted that the modification may introduce a conflict in a publish task that might otherwise be occurring at the same time in a different thread.
  • a write task may also be installed in an isolation context object 151 (in competition with one or multiple publish tasks) when it is determined that there is a possibility that the modification may induce a conflict.
  • a write task forces a serialization of all modifications of a given field, regardless of isolation contexts.
  • some modifications may be made directly on the MSV state 162 associated with the MSV object 166 without installing corresponding write tasks.
  • such modifications may include those which cannot be affected by other modifications to the same MSV object 166 (e.g., operations to set a value in any view 120 and any modification operation in a view 120 associated with a snapshot isolation context 1 10) and cannot introduce conflicts 152 into isolation context objects 151 (e.g., operations in views 120 associated with non-snapshot isolation contexts 1 10 that have no child isolation contexts 1 10 and are either non-publishable or have no publishable sibling isolation contexts 1 10).
  • isolation context objects 151 e.g., operations in views 120 associated with non-snapshot isolation contexts 1 10 that have no child isolation contexts 1 10 and are either non-publishable or have no publishable sibling isolation contexts 1 10).
  • the performance of a write task may entail the performance of one or more phases: a cache state phase, an install context events phase, an add value to value chain phase, and an add conflicts phase.
  • a write task may contain an atomically-updated indication of the current phase being performed by some thread or an indication that all phases have been completed.
  • a write task may contain an immutable reference to the MSV object 166 and may contain an immutable reference to the write value chain 156 to be modified.
  • a write task may contain a Boolean value (initially false), which represents whether the write value chain 156 was modified.
  • a write task may contain an immutable modification operation (e.g., an operation to set or add).
  • a write task may contain an immutable argument to the operation.
  • a write task may contain an immutable indication of whether the modification resolves conflicts.
  • a write task may contain an atomically-updatable reference (initially null) to an MSV state 162.
  • a write task may contain an atomically- updatable timestamp (initially null) that represents the start time of the operation.
  • a write task may contain the value returned to be returned by the modification operation.
  • the thread When a thread performs the write task, the thread reads the indication of the current phase, performs the associated action and attempts to update the current phase indication by replacing the indication of the phase performed with an indication of the next phase (or that the write task is complete if there is no next phase). If this fails, it may be inferred that another thread already recorded the completion of the phase and may have performed later phases. The thread continues processing based on the resulting phase indication (e.g., the phase indication set by the thread or by another thread) until the current phase indication indicates that the write task is complete.
  • the resulting phase indication e.g., the phase indication set by the thread or by another thread
  • the thread performs the following in order. First, the thread attempts to change the start time from zero to a value read from the event counter 164. If this attempt fails, then the start time has been set by another thread. Next, in the cache state phase, the thread makes an attempt to change the cached state from null to the MSV state 162 currently associated with the MSV object 166. If the attempt fails, it may be inferred that another thread made the change. This sequence ensures that publication of an isolation context object 151 after the recorded start time cannot invalidate the recorded MSV state 162.
  • the thread adds the write task to every isolation context object 151 for which the present modification could possibly create a conflict.
  • the installed write task in a given isolation context 151 therefore causes a check for conflicts to be performed due to the installed write task before the given isolation context 151 may publish. . If there are is a publish task installed on such an isolation context object (indicating an ongoing publication), the thread assists in completing the publish task (i.e., the thread assists the ongoing
  • the thread adds the write task to every isolation context object 151 associated with an open publishable value chain 156 in the MSV state 166, including the write value chain 156, with the exception of those value chains 156 that are already associated with a conflict 152.
  • An atomically-updatable integer indicating the next value chain 156 associated with the MSV state 162 to process may be used to ratchet the next slot to consider, allowing threads to avoid trying to add the write task to an isolation context object 151 that has already received it from another thread.
  • the thread After installing the task on each isolation context object 151 , the thread checks to see whether the isolation context object 151 was published after the start time. If the thread determines that the isolation context object 151 was published after the start time, then the publish occurred after the creator of the task called the close_published_views() process. Therefore, in accordance with example implennentations, the thread calls the closed_published_views() process again on the cached MSV object 166 and updates the cached MSV state 162 to the result. In this call, an indication that the thread is already processing a write task may be passed, to prevent the close_published_views() process from clearing the pending write task on the MSV object 166. The task may also update the cached start time to the current timestamp (unless the cached start time was already greater due to the time being set by another thread after the current time was read).
  • a thread may perform the following operations:
  • the value is computed based on the operation and argument and stored in the task. This computation may not be performed in an atomic manner, as an assumption may be made that repeated evaluations will yield the same value.
  • isolation context 1 10 was published after the last value added was added to the write value chain 156, the otherwise unnecessary modification may be deemed to be necessary.
  • the call to the add_value() operation may be specified to indicate in the write value chain 156 that the value object 158 added was the result of a frozen read.
  • the thread adds the actual and contingent conflicts and removes the write task from all isolation context objects 151 , as described below:
  • the thread removes the write task (if present) from the isolation context objects 151 associated with all open publishable value chain 156 in the MSV state 162, and the phase is complete.
  • the write value chain 156 does not already have an associated conflict 152, the write task is not noted to be resolving a conflict, the current view object
  • the write task may be removed from the current isolation context object 151 unless that is the write isolation context object 151 .
  • the system 100 may be a system 500 of one or multiple physical machines 510, as depicted in Fig. 5.
  • the physical machine 510 is a processor-based machine that is constructed from machine executable instructions, or "software" 560 and actual hardware 520.
  • the hardware 520 of the physical machine 510 may include, for example, one or multiple processing cores 522 (e.g., central processing unit (CPU) cores and/or graphics processing unit (GPU) cores), a memory 524, one or multiple network interfaces 526, one or multiple mass storage devices 527, a display, input/output (I/O) devices, and so forth.
  • processing cores 522 e.g., central processing unit (CPU) cores and/or graphics processing unit (GPU) cores
  • memory 524 e.g., one or multiple network interfaces 526, one or multiple mass storage devices 527, a display, input/output (I/O) devices, and so forth.
  • I/O input/output
  • the memory 524 may be a non-transitory memory, which includes non-transitory memory storage devices, such as semiconductor memory devices, phase change memory devices, random access memory (RAM) devices, dynamic RAM (DRAM) devices, resistive memory devices, flash memory devices, a combination of one or more of these devices, and so forth.
  • non-transitory memory storage devices such as semiconductor memory devices, phase change memory devices, random access memory (RAM) devices, dynamic RAM (DRAM) devices, resistive memory devices, flash memory devices, a combination of one or more of these devices, and so forth.
  • the machine executable instructions 560 may be stored in a non-transitory computer-readable storage medium, such as the memory 524, for example.
  • the instructions 560 when executed by one or multiple of the processing cores 522, may cause the processing core(s) 522 to execute one or multiple applications 566, i.e., execute the program code 1 12 as part of one or multiple operating system processes 564.
  • One or multiple processes 564 may share a given isolation context, as discussed herein.
  • the machine executable instructions 560 may include an operating system 565, a virtual machine monitor (VMM) 569, or hypervisor, as well as program instructions 568 that, when executed by the processing core(s) 522 cause the core(s) 522 to provide the context management resources 130 (see Fig. 1A).
  • VMM virtual machine monitor
  • hypervisor program instructions 568 that, when executed by the processing core(s) 522 cause the core(s) 522 to provide the context management resources 130 (see Fig. 1A).
  • the physical machine 520 may store, in accordance with example implementations, data 570, such as data 572 for the data structures 160 (see Fig. 1 A), data 574 for objects (see Fig. 1 A), and so forth.
  • data 570 such as data 572 for the data structures 160 (see Fig. 1 A), data 574 for objects (see Fig. 1 A), and so forth.
  • the system 100 may include a high speed interconnect 580 (a server rack backplane, a server cabinet backplane, a bus, a serial link, and so forth) that interconnects multiple physical machines 510.
  • a high speed interconnect 580 a server rack backplane, a server cabinet backplane, a bus, a serial link, and so forth
  • the system 100 may include three or more physical machines.
  • the system 100 may include a single physical machine 510.
  • the machines 510 may be disposed at a single physical location (or facility) or may be geographically distributed at multiple locations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A technique includes preventing a first modification made to a data structure within a first isolation context from being viewed by a second isolation context prior to publication of the first isolation context. In response to a first attempt to publish the first isolation context, a conflict is identified that prevents the first publication attempt; and a second modification is made within the first isolation context to resolve the conflict. The first isolation context is published in connection with a subsequent publication attempt, which includes allowing the second isolation context to view a result of the first and second modifications, as of a time associated with the subsequent publication attempt.

Description

MANAGING AN ISOLATION CONTEXT
Background
[0001 ] A computer system may have a memory that is shared by multiple computing entities (multiple threads, for example). The computing entities may concurrently perform computations that change the values that are stored in the shared memory. One way to control the concurrent processing by the computing entities is to organize the changes by the entities into transactions and atomically commit the transactions to memory in a manner that maintains the memory in a consistent state.
Brief Description of the Drawings
[0002] Fig 1 A is a schematic diagram of a system according to an example implementation.
[0003] Fig. 1 B is an illustration of data structures to manage multiple simultaneous value (MSV) objects according to an example implementation.
[0004] Fig. 2 is an illustration of a hierarchical ordering of isolation contexts according to an example implementation.
[0005] Fig. 3A illustrates the relationship of a parent isolation context and a live child isolation context created from the parent isolation context according to an example implementation.
[0006] Fig. 3B illustrates the relationship of a parent isolation context and a snapshot child isolation context created from the parent isolation context according to an example implementation.
[0007] Figs. 4A and 4B are flow diagrams illustrating techniques to manage publication of an isolation context according to example implementations.
[0008] Fig. 5 illustrates a schematic diagram of a system of physical machines according to a further example implementation.
Detailed Description
[0008] Multiple threads executing in one or more processes of a computer or multiple computers may perform operations that are directed to one or multiple shared data structures. One approach to maintain a consistent state of the data structure is to block other threads from making changes to the data structure while one of the threads makes changes. This may, however, result in inefficient processing.
Another approach to maintain a consistent state of the data structure is for the threads to process their changes as transactions, which are atomically committed to memory or rolled back (e.g., their modifications discarded) upon discovery that changes made by another thread conflict with the changes attempting to be committed. Such an approach may present challenges for relatively large
transactions (e.g., transactions that read or modify a large number of locations associated with the data structure or transactions that run for a long time before attempting to commit), however. For such transactions, the probability that no other thread, during the course of the transaction's execution, made a modification that results in a conflict that prevents the commit attempt from being successful may be relatively small. Moreover, due to the complexity of the transaction, it is likely that a second attempt (and subsequent attempts) at redoing the changes following a rollback and retrying the committing the transaction may once again fail.
[0009] In accordance with example implementations that are discussed herein, computations on a shared data structure may be isolated within corresponding isolation contexts. In general, an "isolation context" (also called a "computational context" herein) refers to an environment in which computations that are performed within the environment are contained within the environment so that the results of the computations are not, in general, visible to other isolation contexts. Due to the computational isolation, machine executable instructions, or program code, that is executing within the isolation contexts may concurrently make modifications to a data structure that is shared by the contexts. Here, a "data structure" refers to an organization of one or multiple units of data, which are stored in one or multiple storage locations. In accordance with some example implementations, an isolation context may be used to access (e.g., read or modify) multiple data structures. Each isolation context may present to program code a corresponding "view" of the data structure, where the "view" refers to the value(s) that the isolation context reads for corresponding properties of the data structure. As described herein, in accordance with example implementations, a single isolation context may be associated with multiple views. As examples, a "property" may be a location associated with the data structure (e.g., a field within a record or an index of an array), a structural property of the data structure (e.g., a number of elements in a list), or a relationship that is associated with the data structure (e.g., an association of a value with a key in a map).
[0010] Although the isolation contexts create computational isolation, there are mechanisms by which an isolation context may transfer information to another isolation context. One way a first isolation context may transfer information to a second isolation context is for the first isolation context to publish. "Publishing" an isolation context refers to combining or merging the view of the publishing isolation context with the view of another isolation context so that the views are the same at the time of publication. A given publication attempt, however, may not succeed due to one or multiple publication conflicts. In general, a publication conflict (also called a "conflict" herein) refers to a reason for the publication not to occur. An example of a conflict is the existence a modification made to a data structure (e.g., a change to a field of a record) within the isolation context that is the target of the publication attempt when a similar modification (e.g., a change to the same field of the same record) was made within the isolation context being published or when a similar value (e.g., the value of the same field of the same record) was read within the isolation context being published and a modification to the same data structure or another data structure was made that may have been based on the value that was read.
[001 1 ] Systems and techniques are described herein for purposes managing isolation contexts. For example, the publication of an isolation context may be managed to resolve any conflicts that prevent the publication from occurring. In this manner, conflicts may occur in connection with the initial publication attempt or in one or multiple subsequent intermediate attempts before a successful, final publication attempt. "Resolving" a conflict refers to obviating a reason preventing publication from occurring. More specifically, as described herein, any conflict(s) that cause a publication attempt to fail are identified; one or multiple actions are taken to resolve the conflict(s), including action(s) that involve making one or multiple modifications within the first isolation context; and then, publication subsequently occurs by allowing the second isolation context to view the modifications made by the first isolation context prior to the first publication attempt as well as the
modification(s) made to resolve the conflict(s), as of the time that is associated with subsequent publication attempt.
[0012] Referring to Fig. 1A, as a more specific example, in accordance with some implementations, a processor-based system 100 includes context management resources 130, which the system 100 uses to create and manage isolation contexts 1 10 (N isolation contexts 1 10-1 , 1 10-2...1 10-N being depicted in Fig. 1A), and manage publications by the isolation contexts 1 10.
[0013] In general, an isolation context is a mechanism by which executing machine executable instructions (called "program code 1 12," herein) may isolate itself from other executing program code 1 12. As a more specific example, the program code 1 12 may be an entire application or program or may be part of such an application or program, such as a thread. At any given point, program code 1 12 may be working in a prevailing (also referred to herein as the "current" or "working") isolation context 1 10. For example, program code 1 12-1 may be working in a prevailing isolation context 1 10-1 . In a system that includes multiple threads, each thread may have its own prevailing isolation context 1 10, and different threads executing program code 1 12 at the same time may be working in different prevailing isolation contexts 1 10. In accordance with some implementations, when a parent thread creates a child thread, the prevailing isolation context 1 10 in the child thread when it begins executing is that of the parent thread (e.g., at the time the child thread was created). References herein to program code 1 12 "executing in" a given isolation context 1 10 refer to the program code 1 12 executing while the given isolation context 1 10 is the prevailing isolation context. [0014] As a more specific example, a sequence of machine executable instructions (called "thread A" for this example) of program code 1 12 may be executing in one of the isolation contexts 1 10, and another sequence of machine executable instructions (called "thread B" for this example) of program code 1 12 may be executing in another one of the isolation contexts 1 10. Also, for this example, threads A and B may share a data structure 106. With a few exceptions that are described below, changes made to data by thread A to the data structure 106 may be invisible to thread B, and changes made to the data structure 106 by thread B may be invisible to thread A. That is, different computations, working at the same time and looking at the same fields of the same records, may correctly see different values. In general, as further described herein, each isolation context 1 10 has an associated view 120 (or multiple views 120) to a given data structure 106; and as such, multiple isolation contexts 1 10 may have different associated views 120 of the same data structure 106.
[0015] Multiple operating system threads may work in the same isolation context 1 10, in accordance with example implementations. In some examples, the multiple operating system threads may be associated with multiple operating system processes and the multiple operating system processes may be associated with multiple computers. This allows sharing of the same view 120 among multiple processes and computers. Moreover, this arrangement allows processes that share an isolation context 1 10 to be written in different programming languages.
[0016] As illustrated in Fig. 1A, the data structures 106 may be stored in a data store 104. In accordance with some implementations, the data store 104 may include a namespace that associates names with the data structures 106. An example of a namespace is a file system, in which data structures 106 are stored as files that are identified by corresponding file names. Another example of a namespace is a key- value store, which stores associated key-value pairs. In this manner, a key may be used to find a respective value that is stored in the key-value store.
[0017] The data store 104 may be stored in a physical storage device (a volatile or non-volatile memory device, a hard disk device, and so forth) or may be stored across a distributed arrangement of physical storage devices. In general, a "data structure 106" refers to any unit of data that may be stored. Examples of data structures 106 include files, records, lists, sets, maps, tables, arrays, strings, queues, stacks, graphs, directories, primitives (a number, a Boolean value, a character, as examples), and so forth. A data structure 106 may also include a data structure included by reference (e.g., a pointer).
[0018] As depicted in Fig. 1A, in accordance with example implementations, the context management resources 130 may include one or multiple libraries 134, with each library 134 including one or multiple functions 136 that may be called by the program code 1 12 for such purposes as creating isolation contexts, establishing views for isolation contexts, binding functions to isolation contexts, identifying conflicts, resolving conflicts, publishing contexts, as so forth, as further described herein. Moreover, as further described herein, the context management resources 130 may contain one or multiple objects, which may also be used for purposes of creating, maintaining, and managing the isolation contexts 1 10, as described herein.
[0019] Referring to Fig. 2 in conjunction with Fig. 1A, in accordance with example implementations, the isolation contexts 1 10 are fully hierarchical, as illustrated by an example hierarchical tree 200. In this manner, multiple isolation contexts 1 10 may form a tree that is rooted at a top-level, global isolation context 1 10, and a given isolation context 1 10 may have any number of children. For the example of Fig. 2, isolation context 1 10-3 is a global context that is a parent of isolation contexts 1 10-4 and 1 10-5 and grandparent of isolations contexts 1 10-6 and 1 10-7; isolation contexts 1 10-4 and 1 10-5 are siblings; isolation contexts 1 10-6 and 1 10-7 are children of parent isolation context 1 10-4; and isolation contexts 1 10-6 and 1 10-7 are siblings. The isolation context hierarchy may extend to any depth.
[0020] One way for information to travel from one isolation context 1 10 to another isolation context 1 10 is for a child isolation context 1 10 to successfully publish, thereby making the changes visible in the parent isolation context 1 10. In general, if isolation context P is a parent of isolation context C, then changes made within isolation context P are generally visible in the isolation context C, but changes made in the isolation context C are not visible in the isolation context P until the isolation context C is successfully published. [0021 ] In accordance with example implementations that are discussed herein, the child isolation context 1 10 may have two forms, which are specified when the child isolation context 1 10 is created: a transparent, or live, isolation context; and an opaque, or snapshot isolation context.
[0022] Referring to Fig. 3A in conjunction with Fig. 1A, for a live child isolation context (such as example live child isolation context 1 10-9 of Fig. 3A) changes made to a location L (such as location 304 in Fig. 3A) in the isolation context's parent (such as isolation context 1 10-8 of Fig. 3A) are visible in the live child isolation context until a change is made to the location L within the child isolation context. It is noted that the changes made in the parent isolation context may actually be changes that are made to an ancestor of the parent isolation context but are visible in the parent isolation context; and the changes may be due to a different child isolation context of the parent isolation context publishing its changes to the parent isolation context.
[0023] Referring to Fig. 3B in conjunction with Fig. 1A, for a snapshot child isolation context (such as example snapshot child isolation context 1 10-1 1 of Fig. 3B), no changes that are made in the parent isolation context (such as parent isolation context 1 10-10 of Fig. 3B) after the snapshot child isolation context is created are visible within the child isolation context.
[0024] Within a live child isolation context C, reads made on locations that have not been modified in the context C return, as stated above, the value that the location would have had, had the read been made in the parent isolation context P. Either at the time of the read or by setting a default (at or after the time isolation context C was created or as a general policy in the process), the read may be specified as being "frozen." A frozen read has the property that, once it occurs, all subsequent reads of that location within isolation context C (frozen or not) return the same value until the location is modified within isolation context C or until isolation context C is successfully published.
[0025] When an attempt is made to read or otherwise determine a value associated with a property within a view 120 associated with an isolation context 1 10, the system may determine a value as of a current time. More generally, to determine a value as of a given time (e.g., a request time), the system may attempt to determine whether a value associated with the property had been established (i.e., previously determined) in the view 120 prior to the request time and subsequent to the later of the time that the isolation context 1 10 was created or the time the isolation context 1 10 was last successfully published prior to the request time. Such a value may have been established by modifications to the property by program code 1 12 executing in the isolation context 1 10, by frozen reads to the property by program code 1 12 executing the isolation context 1 10, or by successful publication of a value for the property from a child isolation context 1 10 of the isolation context 1 10. If such a value was established, then it is the determined value. Otherwise, the determined value may be found by determining an inherited value for the property, where the inherited value is the value associated with the property within a parent view 120 of the view 120 associated with the parent isolation context 1 10 of the isolation context 1 10 as of an effective time based on the view 120 and the request time. As an example, when the isolation context 1 10 associated with the view 120 is a live (i.e. non-snapshot) isolation context 1 10, the effective time may be the request time. As another example, when the isolation context 1 10 associated with the view 120 is a snapshot isolation context 1 10, the effective time may be the later of the creation time of the snapshot isolation context 1 10 and a time associated with the latest successful publication of the snapshot isolation context 1 10 prior to the request time.
[0026] Publishing and the creation of live and snapshot contexts are some ways that information may travel from one isolation context 1 10 to another. Another way that information may travel from one isolation context 1 10 to another is by returning the result of a computation that is performed in another isolation context 1 10. For example, in accordance with some implementations, an isolation context 1 10 may invoke a call operation, which takes a function 136 (see Fig. 1A) as an argument (and perhaps takes other arguments to pass to the function), temporarily sets the prevailing isolation context to another context, runs the function 136, and returns the result. The result of the call operation remains the value as it appears in the invoked isolation context 1 10. For example, an isolation context C may call the following function: x = D.call(f, r) where "D" represents another isolation context 1 10, and "r" represents a parameter to function f, which is a record with a "count" field and where function f is to modify the count field of the provided record and return the record as its value. Continuing the example, within isolation context C, the count may have had a value of "10," but calling the function f sets the count to 20. After the call, the two variables r and x refer to the same record, but the view presented through the variable r is that of the isolation context C, while the view presented through the variable x is that of the isolation context D. Because of these differing views, r.getCountQ continues to return "10," but x.getCount() returns "20." And if something elsewhere caused a computation in the isolation context D to set the count on the same record r to 30, further evaluation in the isolation context C causes x.getCount() to return "30." In accordance with example implementations, the view presented through the variable x is neither that of isolation context C nor isolation context D but a composite "C's view of D's view" associated with isolation context C. If, within isolation context C, the count of the record accessed via x is either modified or accessed via a frozen read, subsequent reads of the count in the isolation context C via variable x result in the value established within isolation context C and ignore any intervening modifications made within isolation context D. This distinction remains until isolation context C is successfully published, at which point D's view and C's view of D's view are again identical. In this way, an isolation context may have multiple associated views 120.
[0027] In accordance with example implementations, program code in an isolation context 1 10 may also bind a function to an isolation context (e.g., the prevailing isolation context 1 10 or a different isolation context 1 10), returning a new function 136 which, when invoked, runs the original function 136 in the bound isolation context 1 10 (e.g., by invoking the call operation on the bound context and passing in the original function 136 as a parameter). This binding operation may be used to create a function 136 that is bound to the current isolation context 1 10 and that can be invoked in a child isolation context 1 10 (or another isolation context 1 10 via the call operation). The binding operation may be used to create several new functions by the same function 136 to a number of different contexts 1 10. These multiple isolation contexts 1 10 may be snapshot contexts 1 10 representing the state of the world at various times in the past (e.g., snapshots of a company taken at daily intervals). Alternatively, these multiple isolation contexts 1 10 may be child isolation contexts 1 10 used to explore and select among different alternative approaches to solving a problem. A function bound to the current isolation context 1 10 may be provided to a different context (e.g., by the call operation) to allow the different context to observe and store data in the bound isolation context 1 10.
[0028] Child isolation contexts 1 10 can obtain references to their parent isolation contexts 1 10, so even if there has been a modification or frozen read in a given child isolation context 1 10, the program code 1 12 of a child isolation context 1 10 may, in accordance with example implementations, invoke the call operation on its parent isolation context 1 10 to determine what the value of a particular location is in the view 120 of the parent isolation context 1 10. For example, to obtain the count field of a record "r" as it would be seen in the parent isolation context 1 10 of the current isolation context 1 10, a program written in Java might call: lsolationContext.current().parent().call(() -> r.getCountQ)
In a system in which values as seen in a global isolation context 1 10 (i.e. a root of an isolation context hierarchy 200) are considered to be "committed" and those in any nested isolation context 1 10 are considered to be "uncommitted", a function may be called on the global isolation context 100, for example: lsolationContext.global().call(() -> revenue) to obtain the committed value of a variable. Similarly, such a call may be used to set a default value that is seen by code working in unpublished isolation contexts 1 10. Similar manipulations via the global isolation context 1 10 may be used to manipulate the namespace in a committed manner from within an unpublished isolation context.
[0029] Because the isolation contexts 1 10 may pass out values without publishing modifications, such values may be accumulated. For example, program code 1 12 executing in an isolation context 1 10 may take a snapshot of the state of a company database periodically (e.g., every day, hour, minute or second) and make each of these snapshots available by binding the snapshots to names within a namespace. Alternatively, the executing program code 1 12 may append the snapshots to a single list, facilitating, for example, identifying the states of the database for the ten days that scored highest on some metric (e.g., the days during which the revenue of the company was highest).
[0030] As another example, program code 1 12 executing in an isolation context 1 10 may run, in parallel child isolation contexts 1 10, a series of potential modifications according to models with different parameters, and place the results (indicating predicted consequences) computed in each child isolation context 1 10 (along with parameters used with the respective models) in a single structure. Then, all of the results may be compared with one another and the child isolation context 1 10 that resulted in the best (and only the best) outcome allowed to publish its results.
[0031 ] As another example, program code 1 12 executing in an isolation context 1 10 may run, in parallel child isolation contexts 1 10, a series of non-deterministic simulations and collect the results produced within each simulation into a structure, and analyze the structure to determine, based on the results, an action to take. In such an example, the child isolation contexts 1 10 may be allowed to terminate without ever publishing their modifications to their parent isolation contexts 1 10.
[0032] As an alternative to invoking a call operation on an arbitrary isolation context 1 10 or calling a function bound to an isolation context 1 10, program code 1 12 may execute code 1 12 in an arbitrary isolation context 1 10 by explicitly changing the prevailing isolation context 1 10 to be the arbitrary isolation context 1 10. Such a change may be temporary and bounded to particular region of code 1 12 (e.g., a particular function or program block) by ensuring that at the end of the region of code 1 12 the prevailing isolation context is reverted to what it had been before the change. One way to facilitate such a temporary setting of the prevailing isolation context 1 10 is for program code 1 12 written in the Java programming language to use a "try-with-resources" block, creating a resource object that, upon construction remembers the prevailing isolation context 1 10 and sets the prevailing isolation context 1 10 to the arbitrary isolation context, and that, upon invocation of the close() method on the object, sets the prevailing isolation context 1 10 to the remembered isolation context(). For program code 1 12 written in the C++ programming language, the "resource acquisition is initialization" (RAN) paradigm can be used similarly. In accordance with example implementations, isolation contexts control some, but not all variables or other memory locations available to the program. For such implementations, program code 1 12 executing in the temporary prevailing isolation context 1 10 may obtain a value, such as a reference to a data structure, where the reference is associated with a view 120 associated with the temporary prevailing isolation context 1 10, and may store that value in a location that is not under control of the isolation contexts. After the temporary prevailing isolation context 1 10 is reverted to the former prevailing isolation context 1 10, program code executing in the former prevailing isolation context 1 10 may obtain the value associated with the view 120 associated with the temporary prevailing isolation context 1 10.
[0033] As mentioned above, another way that information may move between isolation contexts 1 10 is when a child isolation context 1 10 publishes its
modifications. It is noted that in accordance with some implementations, the program may specify, when an isolation context 1 10 is created, that the isolation context 1 10 is "detached," which means that the isolation context 1 10 may not be published. An error may be signaled (e.g., an exception may be thrown) if an attempt is made to publish a detached isolation context 1 10.
[0034] In response to a given isolation context 1 10 requesting publication, the system 100 determines whether there are any conflicts that would inhibit (prevent, for example) the publication from proceeding. A conflict may be associated with a property associated with a data structure, where a property may be a location (e.g., a particular field of a particular record or a particular index of a particular array), a relationship managed by the data structure (e.g., the value associated with a particular key in a particular map), or a structural property of the data structure (e.g., the order of elements of a particular list or the number of elements contained in a particular set). In accordance with further example implementations, other types of conflict may be detected. A conflict may arise due to the existence of an
unpublished value associated with the view associated with the publishing isolation context, and a conflict arise due to the existence of an unpublished value associated with an ancestor view of the view associated with the publishing isolation context. An unpublished value may be a value asserted to be associated with a view that has not yet been processed in response to a publication of an isolation context
associated with the view. More specifically, a conflict may arise when the value associated with such a property is changed within the parent isolation context 1 10 of the isolation context 1 10 requesting publication. A conflict may arise when a new value otherwise becomes visible in the parent isolation context 1 10, when such a value change occurs after a particular effective time and when the child (e.g., publishing) isolation context either modified the same property (where such modification includes receiving a value due to the successful publication by a further child isolation context 1 10 of the publishing isolation context 1 10) or when the publishing isolation context performs a frozen read operation to obtain a value associated with the property. The effective time associated with a property relative to the publishing isolation context 1 10 may be established and updated upon creation of the publishing isolation context 1 10, each time the publishing isolation context 1 10 is successfully published, and upon an explicit indication (e.g., during an attempt to resolve conflicts due to an unsuccessful publication attempt) by program code 1 12 executing in the publishing isolation context 1 10 that any conflicts associated with the property have been resolved. In accordance with an example implementation, the effective time may be updated when the publishing isolation context 1 10 is established as the prevailing isolation context 1 10. When the publishing isolation context 1 10 is a live (i.e., non-snapshot) isolation context 1 10, the effective time is also updated the first time a modification to or frozen read of the property is performed following such a creation, successful publication, or explicit indication. The result of these rules is that for a live (i.e., non-snapshot) isolation context 1 10, a conflict can only arise due to a modification in the parent isolation context 1 10 following a modification or frozen read to the same property in the child isolation context 1 10, while for a snapshot isolation context 1 10 the order of the operations in the parent and child isolation context is immaterial.
[0035] In accordance with example implementations, if there are no conflicts, the publication succeeds, and all of the changes made in the publishing context atomically become visible in the parent context. In accordance with example implementations, the entire publication process is atomic. In general, an "atomic" process means that the actions that form the process are treated as being indivisible, i.e., the actions are viewed or treated by the isolation contexts as occurring at the same time. In accordance with example implementations, for an atomic publication process, a thread cannot see some, but not all, of the published modifications in the parent isolation context, and a modification cannot be made in any isolation context that would change the determination that there are no conflicts, between the determination that there are no conflicts and the making available of the
modifications in the parent isolation context. As described further herein, in accordance with example implementations, the conflict(s) may be determined proactively so that the conflicts are known at the time the isolation context 1 10 requests publication.
[0036] When an isolation context 1 10 successfully publishes, the isolation context 1 10 and its parent isolation context 1 10 present the same values for the data structure 106, and the child isolation context 1 10 is no longer considered to have established any current values for the data structure 106. As described above, attempts to determine values for properties of the data structure 106 within the child isolation context 1 10 results in determining inherited values from the parent isolation context 1 10. In a live isolation context 1 10, this has the effect that locations that had been "frozen" by modifications or by frozen reads are no longer considered frozen and the inherited values may vary as modifications are made to the parent context; and for a snapshot isolation context 1 10, the "snapshot time" (i.e. the effective time used to find inherited values from the parent isolation context 1 10) is updated to the publication time. In some implementations, a last-common-snapshot for a given isolation context 1 10 may be obtained by calling the appropriate function 136. The last-common-snapshot is a read-only snapshot child of the context's parent as of the last time the isolation context 1 10 was published (or its creation time if it has not been published). This allows an isolation context 1 10 to compare its changes to a data structure 106 with the values of the data structure 106 when the isolation context 1 10 started or was last successfully published. In accordance with example implementations, an as-created-snapshot for a given isolation context 1 10 (a snapshot going back to the beginning, or creation, of the snapshot) may be obtained by calling the appropriate function 136. [0037] If there are one or multiple conflicts, then publication of the child isolation context 1 10 does not occur, and all of the changes, or modifications, that are made by the child isolation context 1 10 remain invisible to the parent isolation context 1 10.
[0038] In accordance with example implementations, to resolve conflicts, a conflict resolution phase is entered for purposes of taking one or multiple actions to resolve the conflicts. The conflict resolution phase may be handled in a number of different ways, depending on the particular implementation.
[0039] In accordance with some implementations, the program code 1 12 associated with the isolation context 1 10 attempting publication decides how to handle the conflicts and may decide to re-attempt the publication. In accordance with example implementations, before such an attempt may be successful, the program code 1 12 marks each conflict as being resolved so that the conflict does not show up again for a subsequent publication attempt.
[0040] In accordance with example implementations, a given conflict may be resolved by modifying a value of the property associated with the conflict and specifying that such a modification resolves any conflict for that property. In accordance with example implementations, the modification may be made by computing and setting a specific value. In accordance with further example implementations, the modification may be one of the following. The modification may be a resolve-to-current modification in which the value in the publishing context is the one that is used. This is often the correct answer when it can be determined that the value asserted by the child isolation context 1 10 is not dependent on any conflicted value (i.e., if the program code 1 12 of the publishing isolation context 1 10 was executed again, the same result would occur). The modification may be a resolve-to- parent modification in which the current value in the parent isolation context 1 10 is the one that is used. The modification may be a roll-back modification in which the value in the last common snapshot is the one that is used.
[0041 ] In accordance with example implementations, determining a value for a property associated with a conflict includes determining a lack of an established value associated with the property in the view of the child isolation context 1 10; determining an inherited value for the property, where the inherited value is the value for the property in the view of the parent isolation context as of an effective time associated with the child's view and the request time; and considering the inherited value to be the current value. For a live child isolation context 1 10, the effective time may be the request time; and identifying the conflict includes identifying a conflict associated with the property due to the establishment of a value associated with the property in the parent's view subsequent to the establishment of a value associated with the property in the child's view. For a snapshot child isolation context 1 10, the effective time may be the latter of a creation time of the first isolation context and a publication time of the first isolation context prior to the request time; and identifying the conflict includes identifying a conflict associated with the property due to the establishment of a value associated with the property in the parent's view
subsequent to the latter of the creation time of the first isolation context and the latest publication time of the first isolation context.
[0042] It is noted that a given location may be subsequently modified (after having been marked as resolved) before the publication is reattempted. Moreover, marking a conflict as being resolved may cause the system to disregard any known conflict associated with the location and the isolation context 1 10, but one or multiple subsequent modifications in the parent or child isolation contexts 1 10 may introduce one or multiple new conflicts, which arise when the isolation context 1 10 tries again to publish.
[0043] In an example implementation, program code 1 12 performing conflict resolution may make use of the current (e.g., publishing) isolation context 1 10, the parent isolation context 1 10, and the isolation context's 1 10 last-common-snapshot isolation context 1 10. In addition, the program code 1 12 may also make use of a current-at-publish isolation context 1 10 and a parent-at-publish isolation context 1 10, which are read-only snapshots of the current (publishing) context and its parent at the time the publication was attempted (and failed).
[0044] To simplify the process of running code in isolation, in accordance with example implementations, one or multiple mechanisms may be used to encapsulate the process of creating an isolation context 1 10 as a child of the prevailing isolation context 1 10; running the code in it, and (when the created isolation context 1 10 is publishable) attempting to publish the created isolation context 1 10 at the end. In accordance with example implementations, the mechanism may involve keywords, annotations, or other syntactic additions to the source code of the program and the program code may be specified directly. In accordance with further example implementations, the mechanism may involve calling a function (e.g., one of the functions 136 of Fig. 1A) and passing as an argument an indication of a function to be called within the newly created isolation context 1 10. In accordance with some example implementations, the program code 1 12 may specify (e.g., by choice of keyword, annotation, or function or by passing in a parameter) the type of child isolation context 1 10 to create (e.g., live or snapshot, detached or not, read-only or not). In accordance with some example implementations, the program code 1 12 may further specify information used to control behavior upon failure of an attempt to publish. Such information may cause the system to attempt conflict resolution and may control the manner in which conflict resolution is performed. The information may alternatively or in addition direct the system to react to a failure to publish or a failure to resolve conflicts by creating a new child isolation context 1 10 and rerunning the code within it. The information may include one or more termination conditions, which the system may use to determine that further attempts to perform conflict resolution or to rerun the code should not occur. Examples of such termination conditions may include a given number of attempts having been performed, a given time (e.g., a wall clock time or a time duration) having been passed, a given amount of a resource (e.g., disk space or memory) having been consumed, a value having been asserted by another program thread (e.g., an indication that a solution to a problem has been found by another thread or an indication that a program has gone on to a different phase), a given function returning a true value, a Boolean
expression evaluating to a true value, or another termination condition.
[0045] In accordance with example implementations, the conflict resolution may be handled in a number of different ways. For example, in a first conflict resolution approach, the program code 1 12 may decide that resolving the conflict is not worth the effort, and as a result, the isolation context 1 10 does not attempt to republish. For example, the program code 1 12 may be relatively small (online transaction processing (OLTP) code, for example); and as such, the approach may be to simply th row away the associated isolation context 1 10, create a new isolation context 1 10 and execute the program code 1 12 again from the beginning.
[0046] In accordance with example implementations, the conflict resolution may be handled by an object or by a function 136 (see Fig. 1A) that is designated as a conflict resolver at the time the code 1 12 modified the location that was conflicted, such designation indicating that should a conflict be detected associated with this location and value, the object or function should be invoked to resolve the conflict.
[0047] In accordance with example implementations, the program code 1 12 may specify a default resolver (e.g., an object or function ) that is associated with a particular location (e.g., a particular field in a particular record), with a particular field (regardless of what record a conflict occurs in), or a type (e.g., applying to conflicts associated with any property of any data structure as long as the property is associated with values of the given type or applying to conflicts associated with any field of any record as long as the record is of the given type). This type of conflict resolution may be appropriate when a field is, for example, known to be used as a counter. For example, if the value in the last common snapshot is "x," the value in the parent is "y," and the value in the child is "z," then a reasonable resolution to have the value be set to x+y-z, and a resolver may be attached to the field to perform this computation and resolve the conflict by specifying the resulting value.
[0048] In accordance with some implementations, as further described herein, the program code 1 12 may use task-based conflict resolution. With task-based conflict resolution, the program code 1 12 specifies that some or all of its computation is made up of re-runnable tasks that are executed by the program code 1 12. In this manner, as the program code 1 12 executes, the system 100 may keep track of the set of locations read and written while working in each task and the dependencies between tasks (e.g., a dependency create where one task reads a location that was written by another task). In accordance with example implementations, to resolve conflicts with task-based conflict resolution, all tasks that read a conflicting location are selected to rerun, as are all tasks dependent on them (and so on, recursively). Conflicted locations may be resolved to the parent value, the current value, or determined value prior to, during, or after selected tasks are re-run. The selected tasks may then re-run in dependency order and the publication may then be retried.
[0049] In accordance with some implementations, the program code 1 12 may provide a function that takes a collection of conflict objects and runs arbitrary code to determine how to resolve the conflicts explicitly.
[0050] In accordance with example implementations, the program code 1 12 may use a combination of the above-described approaches, even for a single publication attempt. For example, the program code 1 12 may use field-based rules to resolve (and eliminate) some conflicts and then invoke one or multiple functions 136, objects and/or tasks to resolve the remaining conflicts.
[0051 ] Thus, referring to Fig. 4A in conjunction with Fig. 1A, in accordance with example implementations, a technique 400 includes preventing (block 404) a first modification made to a data structure within a first isolation context from being viewed by a second isolation context prior to publication of the first isolation context. In response to a first attempt to publish the first isolation context, the technique 400 includes managing (block 406) the publication of the first isolation context, which includes identifying (block 408) a conflict that prevents the first publication attempt; making (block 412) a second modification within the first isolation context to resolve the conflict; and publishing (block 416) the first isolation context in connection with a subsequent publication attempt, including allowing the second isolation context to view a result of the first and second modifications, as of a time associated with the subsequent publication attempt.
[0052] In accordance with example implementations, several attempts may be made to publish a given isolation context. In this manner, prior to making the subsequent publication attempt, the following may occur: an intermediate attempt may be made to publish the first isolation context; at least one new conflict may be identified that prevents the intermediate publication attempt, wherein the new conflict(s) arise from modifications to the data structure performed in the second isolation context subsequent to the first attempt; and action may be taken to resolve the new conflict(s). More specifically, referring to Fig. 4B, in accordance with example implementations, a technique 430 includes executing (block 434) machine executable instructions in a processor-based machine in a first isolation context and in a second isolation context. The first isolation context presents an associated first view of a data structure, and the second isolation context presents and associated second view of the data structure. Executing the machine executable instructions includes inhibiting (preventing, for example) modifications that are made to the first object in the first isolation context from being reflected in the second view. Here, a modification made in the first view "being reflected" in the second view refers to the modification being reproduced or shown in the second view.
[0053] Pursuant to the technique 430, an attempt is made (block 438) to publish the first isolation context to reflect modifications made to the data structure in the first isolation context in the second view of the data structure. If a determination is made (decision block 442) that one or more conflicts exist, then a decision is made
(decision block 444) whether a termination condition has been satisfied (i.e., a decision to abandon the publication has been reached). If not, one or more actions are taken to resolve the conflict(s) pursuant to block 446 and control returns to block 438. When no conflicts remain, the technique 430 includes publishing (block 450) the first isolation context, including causing the second view to reflect modifications made to the data structure within the first isolation context. These modifications include modifications made during action(s) taken to resolve conflict(s), as of a time associated with the last attempted publication attempt.
[0054] In accordance with example implementations that are described herein, properties of a data structure may be represented by multiple simultaneous value (MSV) objects. In general, an MSV object represents a property of a unit of data and may concurrently, or simultaneously, have multiple, alternative values. For example, a particular logical location, such as a field of a record or a slot in an array, may be represented by an MSV object, such that the field or slot has different, alternative values, when seen through different views.
[0055] A given isolation context may be associated with one or multiple views that may be used when accessing one or multiple MSV objects. Moreover, multiple isolation contexts may be associated with multiple views of a given MSV object. The values associated with different views in a given MSV object may be isolated from each other. In this manner, as part of this isolation, a request to determine a value for the MSV object in a given view may result in the return of one of the MSV object's alternative values; and assigning the value of the MSV object in a given view may not affect the value of the MSV object in another view.
[0056] Referring to Fig. 1 B in conjunction with Fig. 1A, in accordance with example implementations, the system 100 uses various data structures 150 to manage the MSV objects. In accordance with example implementations, a given data
structure150 may be an object represented by a C++ class.
[0057] The data structures 150 may include isolation context objects 151 . Each isolation context object 151 corresponds to one of the isolation contexts 1 10. It is noted that in the following discussion, "isolation context object 151 " and "isolation context 1 10" may be used interchangeably. Conflicts 152 that prevent publication of the isolation context 1 10 may be installed on a corresponding isolation context state 157.
[0058] The data structures 150 may include view objects 153. Each view object 153 may correspond to one of the views 120. It is noted that in the following discussion, "view object 153" and "view 120" may be used interchangeably.
[0059] The data structures 150 may include one or multiple conflict generators 160. In general, a conflict generator 160 may construct conflicts referring to a particular location. Each subtype of conflict may have its own generator subtype, in
accordance with example implementations.
[0060] The data structures 150 may include context states 157. In general, a context state 157 represents the current set of conflicts and "contingent conflicts" for an associated isolation context 1 10, as well as a count or event time, representing the last time the isolation context associated with the conflict successfully published.
[0061 ] The data structures 150 may include an event counter 164, which may represent a monotonically-increasing set of values that are incremented by particular events. In accordance with some implementations, the event counter 164 may represent a global, shared event count, which is atomically incremented. In accordance with example implementations, the constant representing the greatest possible value of an event counter may be used to represent the most recent event count. Logically, values of the event counter 164 may be associated with snapshot creation times, in accordance with example implementations. Values read from the event counter 164 may be considered to be timestamps denoting points in type of the execution of system 100, and the value stored in the event counter 164 may be considered to be the current time (or timestamp) of the system 100 with respect to the context and MSV object management resources 130. It is noted that save that their values monotonically increase over time, meaning that timestamps may be compared with one another, these timestamps have no necessary relationship with any other notion of time (e.g. wall-clock time or time since the start of an operating system or process in system 100). It is also noted that different reads of the event counter 164 may correctly read the same timestamp value, but consecutive reads of the event counter 164 may never read an earlier timestamp after reading a later timestamp.
[0062] The data structures 150 may represent value chains 156, value objects 158 and MSV objects 166. Each value chain 156 may be associated with a particular view object 153 (representing a view 120) that may be use when accessing an MSV object 166; and the value chain 156 represents a history of value objects 158 (for a particular MSV object 166 noted in the given view 120), starting from the most recently asserted and extending in a time ordered sequence back in time. In accordance with example implementations, each value object 158 may contain a value (e.g., a primitive value or a reference to a data structure) for the MSV object 166 in the view 120 associated with the value chain 156 valid as of a particular timestamp, a timestamp representing that effective time and a link to a prior value object 158 (if any) for the same view 120.
[0063] As also depicted in Fig. 1 B, the data structures 150 may also include view relative pointers 154. A view relative pointer 154 represents a reference, or pointer, to a structured object (a record, array or a map, as examples) and also specifies that read and write accesses to the structured object are to be seen through the perspective of an associated view 120. The view relative pointer 154 may contain a pointer to an object and a pointer to a view object 153. The value stored in a value object 158 may be a view relative pointer 154. In accordance with an example implementation, the view object 154 contained within a view relative pointer 154 stored in a value object 158 on a value chain 156 may be constrained to represent a view 120 associated with the same isolation context 1 10 that is associated with the value chain's 156 view 120.
[0064] A given MSV object 166 represents alternative values for a property of a data structure 106, as seen through different views 120.The data structures 150 may also include MSV states 162. Each MSV state 162 represents the current set of one or multiple value chains 156, which may be associated with an associated MSV object 166. A given MSV object 166 may constrain the values associated with it to all be of a given type (e.g., all integers, all strings, or all lists of employee records).
Alternatively, a given MSV object 166 may permit values of different types to be associated with it.
[0065] The data structures 150 may also include MSV states 162. Each MSV state 162 may represent a set of value chains 156 associated with an MSV object 166.
[0066] The data structures 150 may also include serializing tasks 170. In this manner, the tasks 170 may include write or publication tasks, used to effect either a modification of an MSV object 166 (for a write task) or an attempt to publish an isolation context object 151 (for a publication task). These tasks 170 may be used to ensure that operations that are supposed to be atomic are, in fact, atomic, as simultaneous execution might cause incorrect behavior. Rather than blocking, the serialization of the tasks 170 may allow threads to cooperate with each other, as further described herein.
[0067] In accordance with example implementations, the system 100 may use C++ atomic classes and associated functions, such as a C++ compare_exchange operation (also called a "compare and swap" operation or "CAS" operation).
Moreover, in accordance with example implementations, versioned pointers may be used, which encapsulate a value along with a version number, which is incremented every time the value is modified. The version number may be represented by the use of sequences of bits (e.g., high-order bits) within the versioned pointer value. Pointers, including versioned pointers, may encapsulate flags representing Boolean values (e.g., represented as bits within the pointer value).
[0068] Operations performed on a data structure may be considered 150 to be logically atomic even though the performance of the operations takes measurable time, during which other operations involving the same data structure 150 may be initiated in another thread. In accordance with example implementations, such a logical atomic operation that is associated with a finite processing time may be deemed to be correct when the result returned by the operation is one that would have been correct had the operation been instantaneous and executed at some arbitrary time in between the time the operation started and the time the operation finished and, further, when any modifications the made in the course of performing the operation may be logically ordered as a group with respect to those of other logically atomic operations. It is noted that this standard of correctness may be sufficient to guarantee that no sequence of operations performed in other threads is able to determine that the operation was not performed atomically.
[0069] In accordance with example implementations, each modifiable location may be associated with an MSV object 166; each MSV object 166 may have an associated mapping from view objects 153 to value chains 156 (e.g., in an associated MSV state 162); and each value chain 156 may have an associated timestamped list of value objects 158. The mapping may distinguish between open view objects 153 and closed view objects 153. In accordance with example implementations, the mapping may distinguish between open and closed value chains 156 or views 120 associated with such view objects 153. An open view object 153 is a view object 153 whose associated value chain 156 was last modified (e.g., by the addition of a value object 158) subsequent to the last time that the isolation context 1 10 associated with the view object 153 (representing a view 120) was published prior to the creation of the MSV state 162 (although the isolation context 1 10 may have been subsequently published). A closed view object 153 is a view object 153 whose associated isolation context 1 10 is known to have been published since the last time its associated value chain 156 was modified.
[0070] In accordance with example implementations, whenever a read or modify operation is attempted on an MSV object 166, open view objects 153 associated with the MSV object 166 whose isolation contexts 1 10 (as represented by isolation context objects 151 ) have been published may be closed before the read or modification occurs. More specifically, in accordance with example implementations, an open view object 153 may be closed by a close_published_views process. In this process, a new value object 158 may be added to the value chain 156 associated with view object 153 that is the parent of the view object being closed (e.g., as the head of the value chain 156). The value in the new value object 158 may be the value from the most recent (e.g., first) value object 158 of the value chain 156 associated with the open view object 153, and the timestamp of the new value object 158 may be a timestamp associated with the successful publication of the view object's 153 isolation context 1 10. If there are any view objects 153 to be closed by the close published views operations (e.g., if there are any view objects 153 whose associated isolation contexts 1 10 have been successfully published since the last time a value object 158 was added to the respective value chain 156), this is accomplished by creating a new MSV state 162 for the MSV 166 that shares value chains 156 with the old MSV state 162, and an attempt is made to atomically replace the old MSV state 162 in the MSV object 166 with the new MSV state 162. This attempt may fail due to another thread successfully replacing the MSV state 162 with a different new MSV state 162. If the attempt fails, the close published views operation may be retried until a new MSV state 162 is successfully installed or a determination is made that there are no open view objects 158 in the MSV object's 166 current MSV state 162. In this manner, the value chains 156 are shared, and the threads cooperate with each other, as further described herein. All subsequent operations are done relative to the resulting MSV state 162, even if the MSV's 166 state 162 is changed again before the operation finishes.
[0071 ] In accordance with example implementations, read and modify operations provided by MSV objects 166 may take as a parameter a view object 153
representing the view 120 to use in performing the operation. When the MSV object 166 implements a location referred to by a view relative reference 154, the view object 153 may be that associated with the view relative reference 154. In
accordance with an example implementation, the operations may additionally take as a parameter an isolation context object 151 representing a prevailing isolation context 1 10. In this implementation the isolation context object 151 may be requested to identify a shadow view object 153 for the provided view object 153, as described further below, and that shadow view object 153 may be used in place of the provided view object 153 in performing the operation.
[0072] Read operations find the value chain 156, if any, associated with the view object 153 in the MSV state 162 and return the value from the value object 158 in the value chain 156 where the timestamp of the value object 158 is no later than a specified timestamp and is the latest such timestamp among the value objects 158. In accordance with example implementations, a "most recent" timestamp may be specified for the read operation, which indicates that the value object 158 with the most recent timestamp is to be used. If the timestamp is the "most recent" timestamp, the search for associated value chains 156 may be restricted to the open value chains 156 in the MSV state 162. If there is no associated value chain 156 or if there is no value object 158 on the value chain 156 prior to the timestamp, then the process is repeated, with the view object 158 being replaced by that view object's 158 parent view object (if any). If the isolation context 1 10 associated with the replaced view object 153 is a snapshot isolation context 1 10, the timestamp is replaced by the snapshot time for the isolation context 1 10 (e.g., the timestamp associated with the last successful publication of the isolation context 1 10 or the creation of the isolation context). This may continue until a value object 158 is found and its value is returned or until a view object 153 is determined to not have a parent. In this case, a default value of the appropriate type may be returned.
[0073] In accordance with example implementations, a lazy publication process may be used to effect propagating changes made to an MSV object 166 in a given view 120 due to the publishing of an isolation context 1 10 associated with the view. In this manner, in accordance with example implementations, operations that are directed to effecting the publication do not occur until a subsequent operation directed to the MSV object. More specifically, referring to Fig. 7, a technique 700 includes providing (block 704) an object that has a plurality of alternative values; associating (block 708) a plurality of views with the plurality of alternative values; and associating (block 708) a plurality of computational contexts with the views. The views are isolated (block 712) such that a request to determine a value in a given view results in a value of the plurality alternative values being returned, and a first value associated with a first view is independent from association of a second value a second view. The technique 700 includes publishing (block 716) a computational context of the plurality of computational contexts to allow a value of the plurality of alternative values associated with a view of the plurality of views associated with the published context to be read in at least one other of the views; and in response to an operation directed to the object after the publishing, processing an effect of the publishing on the at least one other views, pursuant to block 720.
[0074] Referring to Fig. 1A in conjunction with Fig. 1 B, as a more specific example, in accordance with example implementations, operations that modify values associated with views 120 an MSV object 166 may be serialized by means of write tasks associated with the MSV object 166. These write tasks are represented by corresponding write task objects 170 (also called "write tasks 170" herein). For example, such operations may include writing a value, incrementing or otherwise modifying a value, or resolving a conflict associated with a value. To effect this serialization, the MSV object 166 may have the capability to be associated with a single pending write task 170. A thread of the system 100 may create a write task object 170 to store the data required to perform the requested modification and to discover any conflicts that result from the modification. This data may be stored in the write task object 170, such that values determined by one thread performing the write task for the MSV object 166 may be visible to other threads performing the write task for the same MSV object 166.
[0075] The write task object 170 may also contain information about steps in the process that have been completed by some thread. This may allow a given thread performing the write task to skip steps that have already been completed by another thread. The thread performing the modification may attempt to change the pending write task associated with the MSV object 166 from a null value to the newly created write task object 170 (e.g., by means of a CAS operation). If this fails, it may indicate that another thread is processing a different write task directed to the MSV object 166 (i.e., may indicate that there is an ongoing modification of the MSV object 166 in association with another view object 153), and the present thread may perform that write task before repeating the attempt to install the newly created write task 170. In this way, multiple threads may cooperate in processing a modification and a thread in one operating system process may complete a modification begun by a thread in a different operating system process that died before the modification was completed. When an installation attempt is successful, the thread may process the newly installed write task.
[0076] A thread of the system 100 may process a write task directed to an MSV object 166 as follows. Within the description below the "current write" refers equivalently to the write task object 170 being invoked, the performance of the steps below, which may take place concurrently in multiple threads, and the intended modification. The write task may, in general, contain the following five steps:
1 . First, the write task obtains the current MSV state 162 associated with the MSV object 166. It is noted that this MSV state 162 may be different from the one obtained as a result of calling a process called "close_published_values()," which is described further below. Due to the serialization of tasks, no further changes may be made by other tasks until the current write task finishes.
2. The isolation contexts 1 10 that are associated with value chains 156 in the MSV state 162 are enumerated. For each isolation context 1 10 for which it is determined that it is possible for the current write to introduce a conflict or contingent conflict for the isolation context 1 10, the thread installs the current write task object 170 in the corresponding isolation context object 151 as a pre- publication task. This ensures that the isolation context 1 10 cannot be published until this write task 170 completes. Therefore publishing of the isolation context 1 10 does not occur until any conflicts 152 have been added to the isolation context object 151 . In this manner, if another thread (not involved with the current write task) attempts to publish one of these isolation contexts 1 10, this other thread first assists (via processing the write task 170) in deciding whether to add conflicts 152. A contingent conflict refers to an indication that should one isolation context 1 10 successfully publish its changes, doing so will, as part of the atomic publishing process, induce a conflict in another isolation context 1 10.
3. Next, the thread processing the write task determines the value to be associated as the result of the modification, and the write task may create a value object 158 and add the created value object 158 to the correct value chain 156.
4. The thread processing the write task next identifies any conflicts 152 or contingent conflicts for each value chain 156 in the MSV state 162 and adds these conflicts 152 and contingent conflicts to the associated isolation contexts
1 10. For example, the system 100 may determine whether the current write implies a conflict for the value chain's associated isolation context 1 10. For example, when the value chain 156 is open, the associated isolation context 1 10 is a publishable isolation context, and the associated view 120 is a descendent of the view being modified, the thread may determine that a conflict should be added to the isolation context object 151 associated with the value chain 156. As another example, when the isolation context 1 10 associated with the view 120 being modified is a publishable snapshot, the view 120 associated with the value chain 156 under consideration is an ancestor of the view being modified, and the timestamp of the most recent value object 158 on the value chain 156 under consideration is more recent than the snapshot time of the snapshot, the thread processing the write task may determine that a conflict should be added to the isolation context object 151 associated with view 120 being modified. As another example, when the value chain 156 is open and its associated view 120 is a sibling of the view 120 being modified, the thread processing the write task may determine that contingent conflicts should be added to one or both of the isolation contexts objects 151 associated with the views 120 such that when one of the corresponding isolation contexts 1 10 publishes the publication adds a conflict at the MSV object 166 to the other isolation context object 151 .
5. As a last step, the thread processing the write task may remove the current write task as a pre-publication task from the isolation context objects 151 that it was added to, thereby allowing any publication(s) to proceed. In an example implementation, this step may be performed as part of the prior step following the determination of any conflicts on all of the isolation context's 1 10 associated value chains 156 or following the addition of a conflict to the isolation context.
Table 1
[0076] Following the performance of the write task, the thread may attempt to remove the write task 170 as the pending write task 170 for the object, e.g., by using a CAS operation to replace the current write task 170 with a null value. The thread may make a single attempt, as failure of the CAS operation may indicate that another thread was previously successful in this removal.
[0077] For a frozen read, a thread may perform a check to determine whether there is already a value object 158 established on an open value chain 156 associated with the operation's view 120 since the last time the associated isolation context 1 10 was published (or at all if the isolation context 1 10 has yet to publish). If such a value object 158 exists, the value associated with the value object 158 may be returned. Otherwise, the thread may initiate a modify operation, where the value to be asserted by the modify operation may be the current value associated with the view, and this value may be returned by the frozen read operation. The modify operation may indicate in the value chain 156 that the value added is as the result of a frozen read and therefore, is not propagated to a value chain 156 associated with the view's 120 parent view 120 following successful publishing of the isolation context 1 10.
[0078] When an isolation context 1 10 is published, the publication process may be serialized with respect to other publication tasks or other write tasks installed as pre- publication tasks (as described above) by concurrently executing threads of the system 100: a thread may install a publication task in the isolation context object 151 associated with the isolation context 1 10 to be published after cooperating in finishing any currently installed publication task or write tasks. In accordance with example implementations, a conflict or contingent conflict may not be added to an isolation context object 151 by a write task 170 unless that write task 170 has been installed as a pre-publication task 170 on the isolation context object 151 ; and a write task 170 may not be installed as a pre-publication task 170 on an isolation context object 151 that has an installed publication task 170 before the publication task is finished with the cooperation of the thread attempting to install the write task 170. In this manner, after a publication task is installed, no further conflicts may be added until after the publication attempt associated with the publication task finishes.
[0079] The system 100 installs any conflicts 152 for an associated isolation context 1 10 in the corresponding isolation context state 157. In this manner, if a given isolation context state 157 has actual conflicts 152 (i.e., non-contingent conflicts), then any attempted publication of the associated isolation context 1 10 fails, and the conflicts 152 are noted and may be addressed, as further described herein.
Otherwise, if a given isolation context 1 10 attempts to publish and has no actual conflicts 152 installed, the isolation context 1 10 is allowed to publish; any contingent conflict(s) 152 are identified and added as actual conflict(s) 152 to the isolation context object(s) 151 associated with the contingent conflict(s) 152; and the isolation context object 151 being published is updated to be associated with a new isolation context state 157 having no (actual or contingent conflicts), an updated last publication time, and its prior isolation context state 157 as a prior state. In accordance with example implementations, the determination that there are no actual conflicts 152 and the updating of the isolation context state 157 may be constrained to constitute a single logically atomic operation.
[0080] In accordance with example implementations, the isolation context object 151 may include one or multiple of the following features. The isolation context object 151 may include an immutable reference to a parent isolation context object 151 .
[0081 ] The isolation context object 151 may include an atomically-updated reference to a current isolation context state 157. This reference may be atomically changed to refer to a new isolation context state 157 whenever the associated isolation context 1 10 publishes or when a conflict or contingent conflict is created for the context 1 10. As noted above, in accordance with example implementations, the isolation context state 157 may contain one or multiple conflicts 152. More specifically, in accordance with example implementations, the isolation context state 157 may include a list of conflicts representing actual conflicts; a list of conflicts representing contingent conflicts; a timestamp representing the last publication time; and a reference to the prior state of the associated isolation context 1 10 as of the last successful publication of the isolation context 1 10 (if any). When an isolation context object 151 is created, an associated isolation context state 157 may be created. The timestamp associated with a created isolation context state 157 may be the result of atomically implementing the event counter 164. In accordance with example implementations, the lists of conflicts and contingent conflicts may be linked lists such that creating a new isolation context state 157 based on an existing isolation context state 157 differing only in the addition of or removal of conflicts 152 on one of the conflict lists may involve isolation context states 157 whose lists share a state in a common suffix.
[0082] The timestamps of the isolation context states 157 associated with an isolation context object 151 (either directly or via prior state references) represent the creation time of the isolation context object 151 and its successful publications. These timestamps may be called "stable times" for the isolation context object 151 and its associated isolation context 1 10. The latest of these stable times (e.g., the timestamp of the isolation context state 157 directly associated with the isolation context object 151 ) may be called the "last stable time" for the isolation context object 151 .
[0083] The isolation context object 151 may have an atomically-updated reference to a pre-publication task that may be a write task or a publication task that is to be completed before publication of the associated isolation context 1 10 may be attempted. In accordance with example implementations, the pre-publication task may be a single publication task or a collection of write tasks, all of which are to be completed before publication of the associated isolation context 1 10 may be attempted. The completion of the write tasks from the collection of write tasks may include basing whether to perform the write task on an indication of whether the task has been completed (e.g., by another thread).
[0084] The isolation context object 151 may have an associated map, also called a "shadow map," which maps view objects 153 to other view objects 153. In accordance with example implementations, the shadow map may be a lock-free map. Moreover, in accordance with example implementations, the shadow map may be a lock-free cuckoo map.
[0085] The isolation context object 151 may have an associated immutable enumerated value that represents whether the associated isolation context 1 10 is a live or snapshot isolation context 1 10.
[0086] The isolation context object 151 may have an immutable, enumerated value that represents the associated isolation context's modification type, which may be a publishable isolation context 1 10, a detached isolation context 1 10 or a read-only isolation context 1 10. A publishable isolation context 1 10 refers to an isolation context 1 10 that may be published. A detached isolation context 1 10 refers to an isolation context 1 10 that is constrained to be one that is not published, but values in the detached isolation context 1 10 may be modified. A read-only isolation context 1 10 refers to an isolation context 1 10 that is constrained to not be published, and values in the read-only context 1 10 are constrained to not be modified. The system 100 may be constrained to not create a publishable isolation context 1 10 whose parent is a read-only isolation context 1 10, as publishing modifications from the child isolation context 1 10 would constitute an impermissible modification to the read-only parent isolation context 1 10.
[0087] In accordance with example implementations, the data structures 150 include a global isolation context object 156 representing the global isolation context 1 10 and having no parent isolation context object 156.
[0088] The isolation context object 151 may also be associated with one or multiple of the following operations.
[0089] The isolation context object 151 may provide a shadow(view object) operation to identify a view object 153 associated with the isolation context object 151 and related to the specified view object 153. If the shadow(view object) operation determines that the isolation context object 151 is associated with the view object 153, the view object 153 is returned as the value of the shadowQ operation.
Otherwise, in accordance with example implementations, the shadow(view object) operation involves checking the shadow map associated with the isolation context object 151 . If there is no entry in the shadow map corresponding to the view object 153, the shadow(view object) operation may include invoking the shadow(view object) operation on the parent isolation context object 151 . If the shadow map contains a view object 153 corresponding to the resulting view object 153, the corresponding view object may be returned. Otherwise, the shadow(view object) operation may create a new view object 153, as a child of the parent's shadow view object 153. The shadow(view object) operation may associate the new view object 153 with the specified view object 153 and with the parent's shadow view object 153. In this way, a hierarchy of shadow view objects 153 may be created and efficiently retrieved.
[0090] The isolation context object 151 may provide a new_child(view type, modification type, timestamp) operation to create a child isolation context object 151 of the isolation context object 151 , where the timestamp associated with the child's isolation context state 157 is the specified timestamp or, if the timestamp is omitted or "most recent" is specified, the result of incrementing the event counter 164, and where the child has the specified view type (e.g., either "live" or "snapshot") and the specified modification type (e.g., "publishable", "detached", or "read-only").
[0091 ] The isolation context object 151 may provide a publishQ operation to install and run a publish task, as described below, after first collaborating in finishing any publish or write tasks already installed in the context's pre-publication task.
[0092] The isolation context object 151 may provide an add_conflict(conflict) operation to first check whether the conflict 152 has been marked as being
"resolved". If not, the operation atomically replaces the current isolation context state 157 with a new isolation context state 157 identical to the current save that the conflict 152 is prepended to the conflict list. The replacement may be performed atomically by a CAS operation, looping until an attempt succeeds and creating a new isolation context state 157 each time. The operation then attempts to install the conflict 152 in a location (e.g., a value chain 156) associated with the conflict 152 to assert that there is known to be a conflict 152 at that location. The attempt to install may be made by a single invocation of a CAS operation attempting to replace a null value with the present conflict 152. Failure in this attempt implies that another conflict 152 has been installed there (e.g., by another thread), and accordingly, the present conflict 152 is marked as being resolved.
[0093] The isolation context object 151 may provide an
add_contingent_conflict(conflict) operation which atomically replaces the current isolation context state 157 with a new isolation context state 157 identical to the current save that the conflict 152 is prepended to the contingent conflict list. The replacement may be performed atomically by a CAS operation, looping until an attempt succeeds and creating a new isolation context state 157 each time.
[0094] The isolation context object 151 may provide a conflict_resolved (conflict) operation to remove a conflict 152 from preventing publication of the isolation context 1 10 associated with the isolation context object 152. If the specified conflict 152 is the head of the current isolation context state's 157 conflict list, the operation replaces the associated isolation context state 157 with a new isolation context state 157 equivalent to the old one, save that all initial conflicts 152 in the conflict list that are marked as resolved are removed. In this way, in accordance with example implementations, resolved conflicts 152 may remain on the conflict list, but after all conflicts 152 have been marked as being resolved, the conflict list in the isolation context state 157 is empty.
[0095] The isolation context object 151 may provide a
was_published_after(timestamp) operation to return a true value if the isolation context object 151 was published after the timestamp and to return a false value otherwise, where the determination may be made based on the timestamp
associated with the current isolation context state 157.
[0096] The isolation context object 151 may provide a publication_after(timestamp) operation. The operation traverses the list of the isolation context states 157, starting from the current isolation context state 157 and continuing by following the prior state pointer; and returns the earliest timestamp associated with a state 157, such that the timestamp of the state 157 is after the specified one. This operation may also indicate whether there is no such timestamp. [0097] The isolation context object 151 may provide a publication_before(timestamp) operation. The operation traverses the list of the isolation context states 157, which may be similar to the publication_after(timestamp) operation. However, the publication_before(timestamp) operation returns the latest publication timestamp before the given one, or a zero timestamp (or a similar timestamp that is considered to be before any other timestamp), if the given timestamp is before the associated isolation context's creation time.
[0098] In accordance with example implementations, during processing of a successful publication of an isolation context 1 10, when contingent conflicts 152 are being processed, multiple thread may cooperating with efforts to effect the
publication. Due to this cooperation, it is possible that multiple threads may attempt to add the same conflict 152 to the associated isolation context object 151 . For purposes of preventing multiple threads from adding the same conflict 152, in accordance with example implementations, the contingent conflict list of the isolation context state 157 may have nodes, where each node indicates a contingent conflict and has an associated "handled?" flag. The contingent conflict associated with a node may only be added as a conflict to its associated isolation context object 151 following a determination that the flag is not set; and the flag may be atomically set following the successful addition of the conflict 152.
[0099] In accordance with example implementations, the view object 153 may have one or multiple of the following properties.
[00100] The view object 153 may contain an unchangeable, or immutable, reference to the isolation context 1 10 (as represented by object 151 ) that created it; and the view object 153 may contain an immutable reference to its parent view object 153 (which may or may not be associated with the same isolation context 1 10).
[00101 ] The view object 153 may have an ancestor cache, which may be a map from view objects 153 to Boolean values, where an association in the map indicates a determination that a given other view object 153 has been determined to be or to not be an ancestor of the current view object 153. In this manner, in accordance with example implementations, a true Boolean value, a false Boolean value and no entry, represent whether the ancestry has been determined to be an ancestor, has been determined to not be an ancestor or is to be determined, respectively. In accordance with some implementations, the ancestor cache is not constructed until a determination is required and is atomically installed. In
accordance with some implementations, the ancestor cache is a lock free map (e.g., a lock free cuckoo map).
[00102] In accordance with example implementations, the data structures 150 include a top-level view object 153, which is associated with the global isolation context object 151 and which has no parent view object 153.
[00103] The view object 153, in accordance with example implementations, may provide an operation that is directed to retrieving a reference to the associated isolation context object 151 .
[00104] The view object 153 may provide a has_ancestor(view object) operation to determine whether a given view object 153 is an ancestor of the view object 153. The operation includes first determining whether the specified view object 153 is the same as the view object 153, the same as the parent of the view object 153, or the top-level view object 153. In any of these cases, the operation may return a true Boolean value. Otherwise, if the view object 153 is the top-level view object 153, then the operation may return a false Boolean value. In accordance with example implementations, if the view object 153 does not contain an ancestor cache, the operation creates one and atomically associates it with the view object 153. The ancestor cache may be examined to determine whether the answer is known. If a Boolean value is found to be associated in the ancestor cache with the specified view object 153, the Boolean value may be returned. Otherwise, the ancestry of the view object 153 may be walked, or traversed, starting with its parent view object 153 and continuing through successive parent view objects 153 until the specified view object 153 or the top-level view object 153 is encountered. The value of the operation depends on the view object 153 encountered during the traversal of the ancestry: a Boolean true value if the specified view object 153 is encountered and a Boolean false value if the top-level view object 153 is encountered. The value of the operation may be associated with the specified view object 153 in the ancestor cache and returned.
[00105] In accordance with example implementations, the conflict 152 may have one or multiple of the following properties. A conflict 152 may contain an atomically-updated flag to indicate, or represent, whether or not the conflict 152 has been resolved. A conflict 152 may contain an immutable reference to an atomically- updated reference to a conflict 152, which is the location in which this conflict is installed.
[00106] The conflict 152 may have one or multiple subclasses holding information identifying different types of locations at which conflicts may occur. In accordance with example implementations such subclasses may include: 1 .) a field conflict, which holds a reference to a record object and an indication of a field within that record object; 2.) a bound name conflict, which holds a reference to a
namespace object and an indication of a name within that namespace object; and 3.) an array conflict, which holds a reference to an array object and an index identifying a position within the array object.
[00107] The MSV object 166 may have one or multiple of the following properties.
[00108] An MSV object 166 may contain an atomically-updated reference to its current state 162.
[00109] An MSV object 166 may contain an atomically-updated reference to a pending write task, which is a write task is in progress and must be completed before another modification can be performed.
[001 10] An MSV object 166 may contain an immutable reference to a conflict generator 160. The type of the conflict generator 160 referred to may depend on the type of location the MSV object 166 represents (e.g., field of record or slot of array). Conflict generators 160 associated with different types of location may generate different instances of different subtypes of conflict 152.
[001 1 1 ] In accordance with example implementations, the MSV object 166 may be an instance of a template (or parameterized or generic) class, where the template parameter may indicate the type of data contained (e.g., the values contained in value objects 158 and provided to and returned by operations). This template parameter may also be used by the MSV state 162, value objects 158, and value chains 156, which may be nested within the class implementing the MSV object 166. Such an arrangement may allow different representations of value objects 158 holding different types of values, which may permit more efficient representation when values are of a primitive type (e.g., numbers, Boolean values, or characters).
[001 12] The MSV object 166 may provide one or multiple of the following operations.
[001 13] The MSV object 166 may provide a read(view, timestamp) operation to determine and return the value of the MSV object 166 for a specified view object 153 as of (i.e., no later than) the given timestamp, which, if omitted defaults to the "most recent" timestamp, indicating that the most recent value should be returned. In accordance with example implementations, a process of reading from an MSV object 166 includes invoking the current_value operation on the MSV state 162 associated with the MSV object 166.
[001 14] The MSV object 166 may provide a read_frozen(view) operation to determine and return the current (e.g., most recent) value of the MSV object 166 for the given view object 153. The read_frozen(view operation) ensures that
subsequent reads (frozen or otherwise, including reads involved in atomic
modifications) in the same view 120 retrieve the same value until the MSV object 166 is modified in the view 120 or until the isolation context object 151 that is associated with the view object 153 is successfully published.
[001 15] The MSV object 166 may provide a has_value(view, timestamp) operation to indicate (e.g., via returning a Boolean value) if a read operation with the same parameters would return a value discovered in a value object 158.
[001 16] The MSV object 166 may provide a modify(view, op, resolve?, argument) operation to modify the value in the given view object 153, according to the given operation applied to the current value object 158 and (when applicable) the given argument. In this notation, the "?" suffix denotes a Boolean value. The operations may include one or multiple of the following: an operation to set the value to the argument; an operation to add, subtract, multiply, or divide the current value by the argument; an operation to clear the value (e.g., set the value to a default value); and an operation set the value to a value present in the MSV object 166, where the present value may be one of the current value in the given view 120; the value in the parent view 120 of the given view object 153, the last stable value (e.g., the value as of the last stable time for the associated isolation context 1 10); and the current value, with an indication that this should be noted as implementing a frozen read. In accordance with example implementations, the "resolve?" argument controls whether this modification should be considered to resolve any conflict 152 for this view object 153 in this MSV object 166.
[001 17] The MSV object 166 may provide a write(view, resolving, new value) operation, which is an alias for the above-described operation with the operation that of setting the value to the given argument.
[001 18] In accordance with example implementations, operations that are provided by the MSV object 166, such as one or multiple of the above-described operations, may be preceded by closing the published view objects 153 in a close_published_views() process, which is described below.
[001 19] The value chain 156 may have one or multiple of the following properties. The value chain 156 may represent the history of value objects 158, which are associated with a view object 153.
[00120] The value chain 156 may contain immutable references to the value chain's view object 153. In accordance with example implementations, the
references may be a pointer that contains flags that record whether the view object's isolation context 1 10 is mutable and/or a snapshot.
[00121 ] The value chain 156 may contain a pointer to the latest, or most recent, value object 158, and in accordance with example implementations, the pointer may contain a flag that indicates or represents whether the most recent value object 158 was due to a frozen read or equivalent operation (e.g., an operation that asserts a value equivalent to the current value, such as an operation of adding or subtracting zero or an operation of multiplying or dividing by one). [00122] In accordance with some implementations, view-relative pointers may be installed on the value chain 156. In this manner, the view-relative pointers may be associated as values in the value objects 158 that are installed on the value chain 156. Moreover, in accordance with example implementations, the view-relative pointers may be constrained to be relative to some view object 153 that shares an isolation context 1 10 with the view object 153 of the value chain 156.
[00123] The value chain 156 may contain an atomically-updated reference to a conflict 152 that is associated with the value chain 156. This may be a null reference to indicate a lack of a conflict.
[00124] In accordance with example implementation, the value chain 156 may provide one or multiple of the following operations.
[00125] The value chain 156 may provide a value(timestamp) operation to return a reference to the value object 158 representing the latest value object 158 in the value chain 156 before the given timestamp, or provide a null reference, if no such value object 158 exists. The operation may involve traversing the value chain 156 by starting at the latest value object 158 associated with the 158 and following each value object's link to the prior value objects 158.
[00126] The value chain 156 may provide an add_value(value) operation to update the most recent value object 158 reference to a new value object 158 with the provided value. The timestamp of the new value object 158 may be the current value of the event counter 164, and the prior pointer of the new value object 156 may refer to the old most recent value object 158. The add_value(value) operation may return the newly added value object 158. In accordance with an example implementation in which the event counter is incremented upon the creation of a snapshot isolation context 1 10, the prior pointer for the new value object 158 may be set to the prior pointer of the old most recent value object 158, rather than the prior pointer of the most recent value object 158, for the case in which the timestamp of the old most recent value object 158 is the current value of the event counter 164. This allows the old most recent value object 158 to be garbage collected (assuming there are no other references to it). That is, in accordance with example implennentations, when the event counter 164 has not changed in between successive additions to a value chain 156, the new value object 158 replaces the old most recent value object 158, rather than extending the chain of value objects 158. It is noted that this is permissible because if the event counter 164 has not been incremented, this means that no snapshot has been created since the last time a value was added to the value chain 156. Therefore, with no snapshot being created after the last time a value was added to the value chain 156, a future read of the MSV object 166 would not have otherwise correctly returned the old most recent value object 158.
[00127] The following scenario may occur. In between reading the current event counter 164 and asserting the value, it is possible that a snapshot isolation context 1 10 was created, incrementing the event counter 164; and a read was performed in this snapshot (or a descendent), resulting in a request for the value at a time preceding this later counter, returning the old most recent value object 158. If the add_value() call is allowed to patch around the old most recent value object 158, the result would be that a second read in this snapshot would result in a different value, violating the rules of what it means to be a snapshot.
[00128] For purposes of preventing the above-described scenario from occur, the following measures may be employed, in accordance with example implementations. The most recent value object pointer contains a read-in-snapshot flag, indicating whether the value from the most recent value object 158 in the value chain 156 was read after following the parent link of a view object 153 associated with a snapshot isolation context 1 10 (e.g., when the timestamp of the read is a stable time for a snapshot rather than the "most recent" timestamp). When finding a value (e.g., in the value(timestamp) operation), when the timestamp is other than the "most recent" timestamp and the read-in-snapshot flag is not already true, before walking the chain of value objects 158, the reference to the most recent value object 158 (including its read-in-snapshot flag) is remembered. If the value object 158 found is the most recent value object 158, a single attempt is made to replace the remembered value object reference with an identical one that has the read-in- snapshot flag set to true. If this attempt fails, it means that either some other thread succeeded (i.e., a failure occurred due to the assumption that the false value for the read-in-snapshot flag was wrong) or another value object 158 was added to the value chain 156 (i.e., a failure occurred because the most recent value object 158 reference had changed). Either cause of failure removes the problem. When adding a value object 158, the old most recent value object reference may be read before the current timestamp is read. As, in accordance with an example implementation, the current timestamp is incremented upon the creation of a snapshot, it may be inferred that either the read-in-snapshot flag of the read most recent value object reference is false or the current timestamp was incremented since the last value object 158 (e.g., the current most recent value object 158) was added to the value chain 156. When the update occurs, the new value for the most recent value object reference has the read-in-snapshot flag set to false. If it is the case that the read-in- snapshot flag was set in between the time the read of the most recent value object reference was made and the time of the attempted update, then the CAS operation to change the most recent value object reference from the read value to the new value fails. In this case, another attempt is made to add the value, which involves reading a new current timestamp and updating the timestamp and next pointer on the value object 158 being added.
[00129] The MSV state 162 indicates states for value chains 156 of an associated MSV. In this manner, in accordance with some implementations, the MSV state 162 contains an array of value chains 156 (by reference). Each value chain 156 may be considered "publishable" or "unpublishable" based on its associated isolation context. Each value chain 156 may further be considered to be "open" or "closed". An open value chain 156 is one on which a value object 158 was added subsequent to the last stable time of the associated isolation context 1 10 as of the time the MSV state 162 was created. A closed value chain 156 is one that is not an open value chain 156. In accordance with example implementations, a closed value chain 156 is publishable, and an unpublishable value chain 156 is open. The value chains 156 may be arranged in the array such that all open value chains 156 preceded all closed value chains 156 and all unpublishable value chains 156 precede all publishable value chains. [00130] In accordance with some implementations, the first entry in the array may be reserved for a value chain 156 that is associated with the top-level view object 153. This value chain 156 is unpublishable (since the top-level view object 153, which lacks a parent is unpublishable) and as such, may be considered open even if it has no value objects 158.
[00131 ] The MSV state 162 may contain an indication of the number of contained unpublishable value chains 156, the number of open publishable value chains 156, and the number of closed value chains 156. These numbers allow the determination of the state of a value chain 156 based on its position within the array.
[00132] In accordance with example implementations, the MSV state 162 may provide one or multiple of the following operations.
[00133] The MSV state 162 may provide the close_published_views() operation, which is further described below.
[00134] The MSV state 162 may provide a values(view, open only?) operation to return the value chain 156 that is associated with the specified view object 153, if it exists. If the "open only?" parameter is a Boolean true value, then a value chain 156 is not returned unless the value chain 156 is found among the open (e.g., unpublishable and open publishable) value chains 156.
[00135] With continued reference to Fig. 1 B, the MSV state 162 may provide a current_value(view, timestamp) operation to return the value associated with the view object 153 as of the specified timestamp. In accordance with example implementations, this operation may traverse parent view objects 153 and the timestamp may be modified when traversing parent view objects 153 of view objects 153 associated with snapshot isolation contexts 1 10. The operation may begin at the specified view object 153 and may call the values(view, open only?) operation to get the value chain 156 associated with the view object 153. The "open only?" parameter to the operation may be the Boolean true value just in case the specified timestamp is the "most recent" timestamp. If a value chain 156 was found, the current_value operation may call value(timestamp) on this result to get a value object 158. If none is found, the operation may replace the current view object 153 by its parent view object 153. If the current view object 153 is associated with a snapshot isolation context 1 10, the operation may replace the timestamp with the last stable time of the isolation context 1 10 prior to the timestamp by calling the
publication_before(timestamp) operation on the isolation context object 151 .
Following these replacements, the search may be performed again, and this process may continue until a value object 158 is found or the parentage is exhausted.
[00136] If a value object 158 is found, it is returned, otherwise a null pointer is returned. During this process, after failing to find a value chain 156 associated with a parent view object 153 that is associated with an isolation context 1 10 that is not read only, when the timestamp is the "most recent timestamp", it is possible that a value object 158 is found on another value chain 156. In this circumstance, there is a possibility that by the time the operation has finished, the MSV state 162 has been replaced in the MSV object 166 with a new one that has a value chain 156 for that view object 153; and the modification that added the value chain 158 took place before the value object 158 found was added. Therefore that value chain 156 contains the value object 158 that should have been returned by the operation. In this situation, the current_value(view, timestamp) operation returns (as a secondary return value) an indication that the value 153 returned should not be trusted if the MSV's state has changed. When this secondary return indication is seen, the caller attempts to confirm that the MSV state 162 is still associated with the MSV object 166. If this is not the case, the caller may call current_value() on the new state. This process may repeat until the secondary return indication is not seen or the MSV object's 166 current state has not changed.
[00137] As discussed above, an MSV object 166 may provide a
close_published_views() operation to update the MSV object 166 to be associated with an MSV state 162 that reflects all publications of isolation contexts 1 10 that have happened since the last time close_published_views() was called. The close_published_views() operation may accept as a parameter a view object 153 (called the "write view" below) indicative of a view 120, if any, that the caller intends to assert a value in and which, therefore is associated with an open value chain 156 in the resulting MSV state 162. The operation may return the following values: 1 . the resulting MSV state 162, which if not null, has been installed in the MSV object 166; 2. the value chain 156 associated with the write view (if specified) in the resulting MSV state 162; and 3. an indication of whether the returned value chain 156 (if any) was an open value chain 156 prior to the operation.
[00138] In accordance with example implementations, a thread executing in the system 100 may perform the following actions when executing the
closed_publ ished_views() operation :
1 . The thread reads the current state of the MSV state 162 (called the
"remembered state" below).
2. If the invocation of the close_published_views() operation is not associated with processing a write task 170 associated with the MSV object 166, then the thread helps with any pending write tasks associated with the MSV object 166, as described above.
3. If there was no remembered state (e.g., if the current state of the MSV object 166 was a null reference), then if there is no write view object 153, the thread returns a null MSV state 162 reference. Otherwise, the thread creates a new value chain 156 associated with the write view. If the write view is not the top- level view object 153, the thread may also create a value chain 156 associated with the top-level view object 153. The thread then creates a new MSV state associated with an array containing the created value chains 156 and recording the number of unpublishable and open publishable value chains 156 this represents. The thread returns the created MSV state 162, the value chain 156 associated with the write view, and an indication that the returned value chain 156 was not open.
4. Otherwise, the thread calls a find_need_close() operation on the MSV state 162, which returns a priority queue indicating open publishable value chains 156 whose associated with isolation context objects 151 that have been published more recently than the timestamp of the most recent value object 158 on the value chain 156. The priority queue may be ordered from least-recently published to most-recently published, and the elements may include the publication time following the most recent value object's 158 timestamp and an index in the value chain 156 array of the MSV state 162. It is noted that the timestamp stored in the priority queue may not be reflect the most recent publication of the isolation context object 151 . In searching for value chains 156 to add to the priority queue, if the thread discovers a value chain 156 associated with the write view 153, it may remember it for later use.
5. If the returned priority queue is empty, then no work is needed to process the effects of publishing any isolation context 1 10. The thread may proceed as follows:
a. If no write view object 153 was specified or if a value chain 156 associated with the write view object 153 was previously discovered and remembered, the thread may return from the operation: the current MSV state 162, the write value chain 156 (if any), and, if a write value chain 156 was found, an indication that the write value chain 156 was previously open.
b. If a write view object 153 was specified and no associated value chain 156 was discovered, it may be inferred that no write value chain exists among the open publishable value chains 156. If the write view object 153 is associated with an unpublishable isolation context 1 10, a search may be made through the unpublishable value chains 156 associated with the MSV state 162. As an optimization, if the write view object 153 is the top-level view object 153, an associated value chain 156 may be found as the first value chain 156 among the unpublishable value chains 156. If an associated value chain 156 is found, it may be returned, along with the current MSV state and an indication that the value chain 156 was previously open. If no unpublishable value chain 156, a new MSV state 162 may be made that is a copy of the remembered state with the addition of a new unpublishable value chain 156 associated with the write view 153. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the new value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
c. If the write view object 153 is associated with a publishable isolation context 1 10, a search may be made through the closed value chains 156 associated with the MSV state 162. If an associated value chain 156 is found, a new MSV state 162 may be made that is a copy of the remembered state, save that the found value chain 156 is moved to become an open publishable value chain 156. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the found value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
d. Otherwise, a new MSV state 162 may be made that is a copy of the remembered state with the addition of a new open publishable value chain 156 associated with the write view 153. If the thread succeeds in installing new MSV state 162 replacing the remembered state, the new MSV state 162, the new value chain 156, and an indication that the value chain 156 was not open may be returned. Otherwise, the close_published_views() operation is retried from the beginning and the values returned.
6. If the priority queue is not empty, an operation called process_close(), described below, may be called on the current MSV state 162, passing in the priority queue and the write view object 153. The operation returns the same three values, which are remembered.
7. An attempt is then made to install the new MSV state 162, replacing the remembered MSV state 162. If the installation of the new MSV state 162 succeeds, then the remembered three values are returned. If the installation of the new MSV state 162 is unsuccessful, then it may be inferred that another thread changed the state 162 while the current thread was working on the state 162. Therefore, the close_published_views() operation retried results of that invocation are returned.
Table 2 [00138] The process_close() operation, referenced above, has access to the priority queue and the write view object 153 (if any) of its caller, and in accordance with example implementations, maintains the following structures: a vector of open publishable value chains 156; a vector of closed value chains 156; and a vector of new unpublishable value chains 156. It is noted that the designation of these value chains 156 as, e.g., open publishable, represent the point of view of an MSV state 162 being designed and the designation for a given value chain 156 may change during the operation.
[00139] The vector of open publishable value chains 156 may initially contain the open publishable value chains 156 in the MSV state 162. Whenever a new open publishable value chain 156 is created or a closed value chain 156 is reopened, the value chain 156 is added to the end of this vector. Whenever a value chain 156 is closed, the slot in the vector is replaced by a null pointer. A side count may be maintained, representing the number of non-null entries in the vector.
[00140] The vector of closed value chains 156 may initially contain the closed value chains 156 in the MSV state 162. In accordance with an implementation, the creation of this vector may be deferred until there is a need to alter its contents.
Whenever an open publishable value chain 156 is closed, the value chain 156 is added to the end of the vector. Whenever a closed value chain 156 is reopened, its slot in the vector is replaced by a null pointer. A side count may be maintained, representing the number of non-null entries in the vector.
[00141 ] The vector of new unpublishable value chains 156 may initially be empty. When a new unpublishable value chain 156 is created, the value chain 156 is added to the end of the vector. The unpublishable value chains 156 of the MSV state 162 being designed comprise the unpublishable value chains 156 of the current MSV state 162 and the contents of this vector.
[00142] In accordance with example implementations, the process_close() operation may repeatedly remove items from the priority queue and identify value chains 156 and publication timestamps, where the values removed reflect the earliest such timestamps in the priority queue, stopping when the priority queue is empty. Each such value chain 156 (which may be called a child value chain) is in the open publishable value chain 156 vector. The operation next identifies the parent value chain 156 associated with the parent view object 153 of the view object 153 associated with the child value chain 156. If the parent value chain 156 is identified in the open publishable value chain 156 vector, the unpublishable value chain 156 vector, or the unpublishable value chains 156 of the current MSV state 162, it is noted. Otherwise, if the parent view object 153 is associated with an unpublishable isolation context 1 10, a new parent value chain 156 is created and add it to the unpublishable value chain 156 vector. If the parent view object 153 is associated with a publishable isolation context 1 10, a search is made through the closed value chain 156 vector (or, if this has not yet been created, the closed value chains 156 associated with the current MSV state 162). If a parent value chain 156 is found, the parent value chain 156 is reopened by removing it from the closed value chain 156 vector and adding it to the open publishable value chain 156 vector. If a parent value chain 156 is not found, a new parent value chain 156 is created and added to the open publishable value chain 156 vector.
[00143] The child value chain 156 may then be closed, and modifications (if any) of the child value chain 156 are transferred to its parent. The child value chain 156 may be removed from the open publishable value chain 156 vector. If there is an indication in the child value chain 156 that the most recent value object 158 was added as the result of a frozen read operation, there is no value that needs to be transferred to the parent value chain. In addition, the most recent value object 158 associated with the child value chain 158 may be removed from the child value chain 156 along with the indication. If this results in a child value chain 156 with no associated most recent value object 158, the child value chain 156 may be discarded.
[00144] If there is no indication that the most recent child value 158 was the result of a frozen read, the most recent parent value object 158 (if any) is discovered on the parent value chain 158. If the most recent parent value object 158 exists and the associated timestamp is greater than or equal to one less than the timestamp retrieved from the priority queue, this may indicate that another thread has already processed the publication of the child view object 153, so this thread need not. Otherwise a new parent value object 158 is created based on the most recent child value object 158, having a timestamp equal to the retrieved timestamp minus one and the same value unless the value is a view-relative pointer 154, in which case the value is a copy where the view object 153 associated with the copy is the parent of the view object 153 associated with the original. The prior value object 158 of the new parent value object 158 is the old most recent parent value object 158, if any. The operation then makes a single attempt to atomically replace the old most recent parent value object 158 with the new parent value object 158 in the parent value chain 156. A failure in this attempt may indicated that another thread has already processed the publication of the child view object 153.
[00145] If the parent view object 153 is associated with a publishable isolation context 1 10, a determination is made as to whether the parent isolation context 1 10 was published following the retrieved timestamp. If it was, a new entry is added to the priority queue reflecting the parent value chain 156 and the earliest publication time of the parent isolation context 1 10 following the retrieved timestamp. It is noted that this step is performed even if an earlier determination was made that another thread installed a new parent value object 158 reflecting the current child value object 158.
[00146] If the child value chain 156 was not discarded, it is now added to the closed value chain 156 vector.
[00147] When the priority queue is empty, the thread may proceed to identify the write value chain 156 associated with the write view object 153, if any. This may be performed by the same procedure as was used to identify each parent value chain 156.
[00148] Next, the thread may create a new MSV state 162 associated with the unpublishable value chains 156 from the current MSV state 162, the unpublishable value chains 156 from the unpublishable value chain 156 vector, the open publishable value chains 156 from the open publishable value chain 156 vector, and the closed value chains 156 from the close value chain 156 vector (or if this last has not been created, from the current MSV state 162). The thread may then return this new MSV state 162, the write value chain 156 (if any) and an indication of whether the write value chain 156 was found among the open value chains 156.
[00149] Cooperative tasks are described herein for purpose serializing operations, such as publication and adding conflicts to an isolation context or modifying a value. Each point of serialization may be associated with a "task holder." The task holder refers to a task and may be atomically updated.
[00150] To perform the task, the task is first installed in the holder. This installation may involve attempting to replace a null pointer in the task holder with a reference to the task by means of the CAS operation. If the task holder was not empty, then this attempt fails, and the current value is obtained. If the blocking task is not the one being installed (i.e., if it was not the case that another thread was successful in installing it), then the task is run in the current thread, and an attempt is made to replace the task with null. If this fails, it means that another thread removed it first, so another iteration occurs to try the install again. When the task is successfully installed, the task is then run and then an attempt is made to remove it (replace it with null).
[00151 ] In accordance with example implementations, the work of the task may occur in the task's run() operation, which may be defined in a subclass.
[00152] Publish tasks may be installed in the isolation context objects 151 to (attempt to) perform a publication of the corresponding isolation context 1 10. The publish tasks are in competition for this task holder with other publish tasks and with write tasks that may wish to add conflicts to this context.
[00153] The data within a publish task is accessible to all threads attempting to complete it.
[00154] In accordance with example implementations, the publish task may have one or multiple of the following properties. The publish task may have an immutable reference to the isolation context object 151 being published. The publish task may have an atomically-updated timestamp representing the publish time (initially zero). The publish task may have an atomically-updated reference to a list of conflicts seen by the publish task. This atomically-updated reference may contain a "has value" flag, initially false, to be able to distinguish between the scenario in which it is unknown whether there are any conflicts 152 and the scenario in which it has been determined that there are no conflicts 152.
[00155] In accordance with example implementations, to perform a publish task via the publishQ call for an isolation context object 151 , the following operations may be performed by a thread of the system 100:
1 . If the publish time is zero, this indicates that the publish time has yet to be established. The current timestamp counter 164 is incremented, and a single attempt is made to change the publish time from zero to the resulting value. If this attempt fails, it means that another thread has already established the publish time.
2. If the has-value flag on the conflicts reference is set, the publish task has finished, and the conflicts reference either contains a list of blocking conflicts or is null, indicating that the publish succeeded as of the publish time. The publish task is complete.
3. Otherwise, the isolation context object's 151 associated isolation context state 157 is requested to attempt to publish as of the determined publish time.
This may request may return a list of conflicts 152, which may be an empty list, indicating successful publication. This list may be installed in the task via the conflicts reference with the has-value flag set. The publish task is complete.
Table 3 [00156] In accordance with example implementations, an isolation context state 157 associated with an isolation context object 151 may attempt to publish as of a specified publication timestamp as follows:
1 . If there are conflicts 152 noted in the current isolation context state 157, they are returned.
2. If this is not the first time this step is processed and the publication time associated with the current isolation context state 157 is later than the specified publication timestamp, this may indicate that the requested publication has been completed in another thread. An empty list of conflicts 152 may be returned.
3. If there are contingent conflicts 152 associated with the current isolation context state 157, then the corresponding contingent conflict list is traversed, adding each contingent conflict 152 to its associated isolation context object 151 . As described above, the nodes in the list may contain a "handled?" flag, which is checked before adding and set afterwards, to minimize duplicate work as threads run through the list at the same time. In accordance with further example
implementations, a count of the contingent conflicts 152 handled is maintained, and a handling flag is used in addition to the "handled?" flag. In accordance with this further example implementation, in a first pass, contingent conflicts 152 having states that may be changed from unhandled to handling, are added, the state then being changed unconditionally to handled, and a counter may then incremented. After the first pass, if the count did not equal the number of items of the entire list, this means that some thread claimed an entry and then terminated or paused; and when this occurs, a second pass may be performed, adding all contingent conflicts who are in handling and incrementing the counter for each of these conflicts if its state may be changed from handling to handled.
4. If the current isolation context state 157 is no longer the one associated with the isolation context object 151 , the preceding steps are repeated with the current isolation context state 157 being that now associated with the isolation context object 151 . 5. It may now be determined that there are no conflicts 152 preventing the publication. A new isolation context state 157 is constructed with the given publication timestamp and whose prior state is the current isolation context state 157.
6. An attempt may then be made to replace the current isolation context state 157 with the new one in the isolation context object 151 . If this attempt succeeds, an empty conflict list is returned.
7. Otherwise, the failed attempt identifies the isolation context state 157 associated with the isolation context object 151 , and the operation may return the result of asking this isolation context state 157 to attempt to publish as of the specified timestamp.
Table 4
[00157] In accordance with example implementations, after the publish task is finished (within the initial publish() call that installed the task), a publish result object is created based on the publish time and conflict list stored in the task. The creation of the publish result object may be performed outside of the publish task. In accordance with example implementations, the publish result object contains an immutable reference to the isolation context object 151 that was published; an immutable timestamp representing the publish attempt time; and an immutable list of blocking conflicts 152, which is empty if the publish attempt succeeded.
[00158] A write task is installed to make a modification to an MSV object 166. It is noted that the modification may introduce a conflict in a publish task that might otherwise be occurring at the same time in a different thread. A write task may also be installed in an isolation context object 151 (in competition with one or multiple publish tasks) when it is determined that there is a possibility that the modification may induce a conflict.
[00159] In accordance with example implementations, a write task forces a serialization of all modifications of a given field, regardless of isolation contexts. In accordance with some implementations, some modifications may be made directly on the MSV state 162 associated with the MSV object 166 without installing corresponding write tasks. In this manner, such modifications may include those which cannot be affected by other modifications to the same MSV object 166 (e.g., operations to set a value in any view 120 and any modification operation in a view 120 associated with a snapshot isolation context 1 10) and cannot introduce conflicts 152 into isolation context objects 151 (e.g., operations in views 120 associated with non-snapshot isolation contexts 1 10 that have no child isolation contexts 1 10 and are either non-publishable or have no publishable sibling isolation contexts 1 10).
[00160] The performance of a write task may entail the performance of one or more phases: a cache state phase, an install context events phase, an add value to value chain phase, and an add conflicts phase. A write task may contain an atomically-updated indication of the current phase being performed by some thread or an indication that all phases have been completed. A write task may contain an immutable reference to the MSV object 166 and may contain an immutable reference to the write value chain 156 to be modified. A write task may contain a Boolean value (initially false), which represents whether the write value chain 156 was modified. A write task may contain an immutable modification operation (e.g., an operation to set or add). A write task may contain an immutable argument to the operation. A write task may contain an immutable indication of whether the modification resolves conflicts. A write task may contain an atomically-updatable reference (initially null) to an MSV state 162. A write task may contain an atomically- updatable timestamp (initially null) that represents the start time of the operation. A write task may contain the value returned to be returned by the modification operation.
[00161 ] When a thread performs the write task, the thread reads the indication of the current phase, performs the associated action and attempts to update the current phase indication by replacing the indication of the phase performed with an indication of the next phase (or that the write task is complete if there is no next phase). If this fails, it may be inferred that another thread already recorded the completion of the phase and may have performed later phases. The thread continues processing based on the resulting phase indication (e.g., the phase indication set by the thread or by another thread) until the current phase indication indicates that the write task is complete.
[00162] In the cache state phase, the thread performs the following in order. First, the thread attempts to change the start time from zero to a value read from the event counter 164. If this attempt fails, then the start time has been set by another thread. Next, in the cache state phase, the thread makes an attempt to change the cached state from null to the MSV state 162 currently associated with the MSV object 166. If the attempt fails, it may be inferred that another thread made the change. This sequence ensures that publication of an isolation context object 151 after the recorded start time cannot invalidate the recorded MSV state 162.
[00163] In the install context events phase, the thread adds the write task to every isolation context object 151 for which the present modification could possibly create a conflict. The installed write task in a given isolation context 151 therefore causes a check for conflicts to be performed due to the installed write task before the given isolation context 151 may publish. . If there are is a publish task installed on such an isolation context object (indicating an ongoing publication), the thread assists in completing the publish task (i.e., the thread assists the ongoing
publication) before adding the write task, as described above. In accordance with example implementations, the thread adds the write task to every isolation context object 151 associated with an open publishable value chain 156 in the MSV state 166, including the write value chain 156, with the exception of those value chains 156 that are already associated with a conflict 152. An atomically-updatable integer indicating the next value chain 156 associated with the MSV state 162 to process may be used to ratchet the next slot to consider, allowing threads to avoid trying to add the write task to an isolation context object 151 that has already received it from another thread.
[00164] After installing the task on each isolation context object 151 , the thread checks to see whether the isolation context object 151 was published after the start time. If the thread determines that the isolation context object 151 was published after the start time, then the publish occurred after the creator of the task called the close_published_views() process. Therefore, in accordance with example implennentations, the thread calls the closed_published_views() process again on the cached MSV object 166 and updates the cached MSV state 162 to the result. In this call, an indication that the thread is already processing a write task may be passed, to prevent the close_published_views() process from clearing the pending write task on the MSV object 166. The task may also update the cached start time to the current timestamp (unless the cached start time was already greater due to the time being set by another thread after the current time was read).
[00165] In the add value to value chain phase, a thread may perform the following operations:
1 . The value is computed based on the operation and argument and stored in the task. This computation may not be performed in an atomic manner, as an assumption may be made that repeated evaluations will yield the same value.
2. A decision is made as to whether a modification to the write value chain 156 is required. If the operation is merely to establish the current value associated with the value chain's 156 view 120 (e.g., pursuant to a frozen read operation) or if the operation is one that will necessarily leave the current value unchanged (e.g., an addition or subtraction of zero or a multiplication or division by one) the
modification may be deemed to be unnecessary. If this is the case, and if the operation is other than to retrieve the current value and either the write value chain 156 is empty (e.g., has no most recent value object 158) or the associated
isolation context 1 10 was published after the last value added was added to the write value chain 156, the otherwise unnecessary modification may be deemed to be necessary.
3. If the modification is deemed to be necessary, an add_value() operation is called on the write value chain 156 to establish the value, and it is noted that a value was written. If it was initially determined that the modification was
unnecessary and this decision was reversed, the call to the add_value() operation may be specified to indicate in the write value chain 156 that the value object 158 added was the result of a frozen read.
4. If the write is to be treated as resolving a conflict and there is a conflict 152 associated with the write value chain 156, the conflict_resolved(conflict) operation of the associated isolation context object 151 is invoked, the conflict 152 is marked as having been resolved, and the conflict field of the write value chain 156 is cleared.
Table 5
[00166] In the add conflicts phase, the thread adds the actual and contingent conflicts and removes the write task from all isolation context objects 151 , as described below:
1 . If there was no modification, then it may be inferred that no conflicts 152 were introduced. The thread removes the write task (if present) from the isolation context objects 151 associated with all open publishable value chain 156 in the MSV state 162, and the phase is complete.
2, Otherwise, all open value chains 156 (whether or not they are publishable) are enumerated. This enumeration may use an atomically updated integer associated with the write task, in the manner used when adding the write task to isolation context objects 151 in order to allow threads to not redo work done by other threads. The current view object 153 and current isolation context object 151 associated with the current value chain 156 in the enumeration are noted.
3. If the current value chain 156 is the write value chain 156 or if the current value chain 156 was last modified prior to the last modification to the write value chain (if any), it is determined that no conflict can be introduced with respect to the current value chain 156 by this modification.
4. Otherwise, if the current isolation context object 151 is publishable, the current value chain 156 does not already have an associated conflict 152, the write view object 153 is an ancestor of the current view object 153, and it is not the case that both the current value chain 156 and the write value chain 156 indicate that their most recent value objects 158 were due to frozen reads, then a conflict 152 is added to the current value chain 156 and current isolation context object 153.
5. Otherwise, if both the write value chain 156 and current value chain 156 are publishable value chains 156, the current value chain 156 does not already have an associated conflict 152, the write view object 153 and current view object 153 have the same parent view object 153, and it is not the case that both the current value chain 156 and the write value chain 156 indicate that their most recent value objects 158 were due to frozen reads, then contingent conflicts 152 are added to both the write isolation context object 151 and current isolation context 151 , each contingent conflict 152 referring to the other isolation context object 151 .
6. Otherwise, if the write isolation context object 151 is a publishable
snapshot, the write value chain 156 does not already have an associated conflict 152, the write task is not noted to be resolving a conflict, the current view object
153 is an ancestor of the write view object 153, and the most recent value object 158 in the current value chain 156 does not have the same timestamp as the last merge timestamp for the current isolation context object 151 , then a conflict 152 is added to the write value chain 156 and write isolation context object 153.
7. Following the decision as to whether a conflict is induced with respect to the current value chain 156, the write task may be removed from the current isolation context object 151 unless that is the write isolation context object 151 .
Table 6
[00167] In accordance with example implementations, the system 100 may be a system 500 of one or multiple physical machines 510, as depicted in Fig. 5. The physical machine 510 is a processor-based machine that is constructed from machine executable instructions, or "software" 560 and actual hardware 520. The hardware 520 of the physical machine 510 may include, for example, one or multiple processing cores 522 (e.g., central processing unit (CPU) cores and/or graphics processing unit (GPU) cores), a memory 524, one or multiple network interfaces 526, one or multiple mass storage devices 527, a display, input/output (I/O) devices, and so forth.
[00168] In accordance with example implementations, the memory 524, in general, may be a non-transitory memory, which includes non-transitory memory storage devices, such as semiconductor memory devices, phase change memory devices, random access memory (RAM) devices, dynamic RAM (DRAM) devices, resistive memory devices, flash memory devices, a combination of one or more of these devices, and so forth.
[00169] In accordance with example implementations, the machine executable instructions 560 may be stored in a non-transitory computer-readable storage medium, such as the memory 524, for example. The instructions 560, when executed by one or multiple of the processing cores 522, may cause the processing core(s) 522 to execute one or multiple applications 566, i.e., execute the program code 1 12 as part of one or multiple operating system processes 564. One or multiple processes 564 may share a given isolation context, as discussed herein. In addition to the application(s) 566, the machine executable instructions 560 may include an operating system 565, a virtual machine monitor (VMM) 569, or hypervisor, as well as program instructions 568 that, when executed by the processing core(s) 522 cause the core(s) 522 to provide the context management resources 130 (see Fig. 1A).
[00170] The physical machine 520 may store, in accordance with example implementations, data 570, such as data 572 for the data structures 160 (see Fig. 1 A), data 574 for objects (see Fig. 1 A), and so forth.
[00171 ] As also depicted in Fig. 5, the system 100 may include a high speed interconnect 580 (a server rack backplane, a server cabinet backplane, a bus, a serial link, and so forth) that interconnects multiple physical machines 510. Although two physical machines 510 are depicted in Fig. 5, it is noted that the system 100 may include three or more physical machines. Moreover, in accordance with further example implementations, the system 100 may include a single physical machine 510. For implementations in which the system 100 includes multiple physical machines 510, the machines 510 may be disposed at a single physical location (or facility) or may be geographically distributed at multiple locations.
[00172] While the present invention has been described with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.

Claims

What is claimed is
1 . A method comprising:
executing machine executable instructions in at least one processor-based machine in a first isolation context and in a second isolation context, including inhibiting first modifications made to the data structure within the first isolation context from being reflected in a view of the data structure associated with the second isolation context; and
managing publication of the first isolation context, comprising:
making a first attempt to publish the first isolation context to reflect the first modifications in the view;
identifying conflicts associated with the data structure that prevent the first attempt;
taking actions to resolve the conflicts, including making second modifications within the first isolation context; and
publishing the first isolation context in connection with a subsequent publication attempt, including causing the view to reflect the first modifications and the second modifications, as of a time associated with the subsequent publication attempt.
2. The method of claim 1 , further comprising:
prior to making the subsequent attempt:
making an intermediate attempt to publish the first isolation context; identifying at least one new conflict that prevents the intermediate publication attempt, the at least one new conflict arising from modifications to the data structure performed in the second isolation context subsequent to the first attempt; and
taking action to resolve the at least one new conflict.
3. The method of claim 1 , wherein the first and second isolation contexts have a hierarchical ordering in which the first isolation context is a child of the second isolation context, and further comprising:
attempting to determine a value for a property associated with the data structure in a view of the data structure associated with the first isolation context as of a request time, wherein attempting to determine the value comprises:
determining a lack of an established value associated with the property in the first view;
determining an inherited value for the property, the inherited value being a value for the property in the view of the data structure associated with the second isolation context as of an effective time associated with the first view and the request time; and
considering the inherited value to be the current value.
4. The method of claim 3, wherein:
the first isolation context comprises a live isolation context;
the effective time comprises the request time; and
identifying at least one conflict comprises identifying a conflict associated with the property due to the establishment of a value associated with the property in the view of the data structure associated with the second isolation context subsequent to the establishment of a value associated with the property in the view associated with the first isolation context.
5. The method of claim 3, wherein:
the first isolation context comprises a snapshot isolation context;
the effective time comprises the latter of a creation time of the first isolation context and a publication time of the first isolation context prior to the request time; and
identifying at least one conflict comprises identifying a conflict associated with the property due to the establishment of a value associated with the property in the view of the data structure associated with the second isolation context subsequent to the latter of the creation time of the first isolation context and a latest publication time of the first isolation context.
6. The method of claim 3, wherein the property comprises one of a location associated with the data structure, a structural property of the data structure, and a relationship associated with the data structure, the method further comprising: determining the established value for the property based on a modification, a frozen read or a successful publication to the first isolation context by a third isolation context.
7. The method of claim 1 , wherein the publishing comprises performing an atomic publishing process, the method further comprising, as part of the atomic publishing process:
determining that no conflicts prevent the publication in connection with the subsequent publication attempt; and
causing modifications made in the view of the data structure associated with the first isolation context to be reflected in the view of the data structure associated with the second.
8. An article comprising a non-transitory computer readable storage medium to store instructions that when executed by a computer cause the computer to:
prevent a first modification made to a data structure within a first isolation context from being viewed by a second isolation context prior to publication of the first isolation context; and
manage publication of the first isolation context, comprising:
in response to a first attempt to publish the first isolation context, identifying a conflict that prevents the first publication attempt;
making a second modification within the first isolation context to resolve the conflict; and
publishing the first isolation context in connection with a subsequent publication attempt, including allowing the second isolation context to view a result of the first and second modifications, as of a time associated with the subsequent publication attempt.
9. The article of claim 8, the storage medium storing instructions that when executed by the computer cause the computer to:
determine for a property associated with the data structure, a current value in a view of the data structure associated with a third isolation context different from the first and second isolation contexts, wherein the third isolation context has a parent, and the third isolation context has not yet published the current value to the parent.
10. The article of claim 9, the storage medium storing instructions that when executed by the computer cause the computer to:
invoke a function bound to the third isolation context to determine the current value in the view of the third isolation context.
1 1 . The article of claim 8, the storage medium storing instructions that when executed by the computer cause the computer to:
create a third isolation context as a child of the first isolation context.
12. A system comprising:
a data store to store a data structure;
a plurality of processing cores to execute machine executable instructions within a plurality of isolation contexts comprising a parent isolation context and a child isolation context created from the parent isolation context; and
context management resources used by the processing cores to manage publication of the child isolation context,
wherein execution of the machine executable instructions within the child isolation context:
attempts to publish the child isolation context to reflect first modifications made within the child isolation context to the data structure in a view of the data structure associated with the parent isolation context, the attempted publication being associated with conflicts that prevent the publication;
performs actions to resolve the conflicts, including making second modifications within the child isolation context; and
publishes the child isolation context in connection with a subsequent publication attempt, the publication allowing the parent isolation context to view the first modifications and the second modifications, as of a time associated with the subsequent publication attempt.
13. The system of claim 12, wherein the execution of the machine executable instructions forms a plurality of operating system processes associated with the child isolation context.
14. The system of claim 12, wherein the execution of the machine executable instructions within the child isolation context:
attempts to publish the child isolation context in at least one additional publication attempt before the subsequent publication, each of the at least one additional publication attempt or attempts being associated with at least one additional conflict.
15. The system of claim 12, wherein the execution of the machine executable instructions within the child isolation context:
in response to the conflicts, determines whether a termination condition has been satisfied; and
based on that determination, selectively prevents the subsequent publication attempt.
PCT/US2015/063055 2015-11-30 2015-11-30 Managing an isolation context WO2017095388A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/063055 WO2017095388A1 (en) 2015-11-30 2015-11-30 Managing an isolation context

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/063055 WO2017095388A1 (en) 2015-11-30 2015-11-30 Managing an isolation context

Publications (1)

Publication Number Publication Date
WO2017095388A1 true WO2017095388A1 (en) 2017-06-08

Family

ID=58797702

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/063055 WO2017095388A1 (en) 2015-11-30 2015-11-30 Managing an isolation context

Country Status (1)

Country Link
WO (1) WO2017095388A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080256074A1 (en) * 2007-04-13 2008-10-16 Sun Microsystems, Inc. Efficient implicit privatization of transactional memory
WO2009039118A2 (en) * 2007-09-18 2009-03-26 Microsoft Corporation Parallel nested transactions in transactional memory
US20100100690A1 (en) * 2008-10-21 2010-04-22 Microsoft Corporation System to reduce interference in concurrent programs
US20100162247A1 (en) * 2008-12-19 2010-06-24 Adam Welc Methods and systems for transactional nested parallelism
US20120151495A1 (en) * 2010-12-10 2012-06-14 Microsoft Corporation Sharing data among concurrent tasks
US20120227042A1 (en) * 2004-12-16 2012-09-06 Vmware, Inc. Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120227042A1 (en) * 2004-12-16 2012-09-06 Vmware, Inc. Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system
US20080256074A1 (en) * 2007-04-13 2008-10-16 Sun Microsystems, Inc. Efficient implicit privatization of transactional memory
WO2009039118A2 (en) * 2007-09-18 2009-03-26 Microsoft Corporation Parallel nested transactions in transactional memory
US20100100690A1 (en) * 2008-10-21 2010-04-22 Microsoft Corporation System to reduce interference in concurrent programs
US20100162247A1 (en) * 2008-12-19 2010-06-24 Adam Welc Methods and systems for transactional nested parallelism
US20120151495A1 (en) * 2010-12-10 2012-06-14 Microsoft Corporation Sharing data among concurrent tasks

Similar Documents

Publication Publication Date Title
Huang et al. Opportunities for optimism in contended main-memory multicore transactions
Yang et al. A wait-free queue as fast as fetch-and-add
Bronson et al. A practical concurrent binary search tree
US8473950B2 (en) Parallel nested transactions
US9047334B1 (en) Merge-update for efficient atomic memory modification in concurrent computer systems
Feldman et al. A wait-free multi-word compare-and-swap operation
Howard et al. Relativistic red‐black trees
US11698893B2 (en) System and method for use of lock-less techniques with a multidimensional database
Wei et al. Constant-time snapshots with applications to concurrent data structures
Qin et al. Caracal: Contention management with deterministic concurrency control
CN114631089A (en) Persistent store file repository for direct mapped persistent store database
Wang et al. Eunomia: Scaling concurrent search trees under contention using htm
US8095731B2 (en) Mutable object caching
Painter et al. Lock-free transactional adjacency list
Felber et al. Elastic transactions
Barnat et al. Fast, dynamically-sized concurrent hash table
Moreno et al. On the implementation of memory reclamation methods in a lock-free hash trie design
Lin et al. Operation-level concurrent transaction execution for blockchains
Yi et al. A Universal Construction to implement Concurrent Data Structure for NUMA-muticore
Sheng Non-blocking Lazy Schema Changes in Multi-Version Database Management Systems
Zhang et al. Eunomia: Scaling concurrent index structures under contention using HTM
Lamar et al. Lock-free transactional vector
Shafiei Non-blocking Patricia tries with replace operations
WO2017095388A1 (en) Managing an isolation context
WO2017095387A1 (en) Multiple simultaneous value object

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15909905

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15909905

Country of ref document: EP

Kind code of ref document: A1