﻿∗ †
SMV: Selective Multi-Versioning STM
Dmitri Perelman Anton Bishevsky Oleg Litmanovich Idit Keidar
Technion Dept. of Electrical Engineering
{dima39@tx,szaparka@t2,smalish@t2,idish@ee}.technion.ac.il
Abstract
We present Selective Multi-Versioning (SMV), a new STM that
reduces the number of aborts, especially those of long read-only
transactions. SMV keeps old object versions as long as they might
be useful for some transaction to read. It is able to do so while
still allowing reading transactions to be invisible by relying on
automatic garbage collection to dispose of obsolete versions.
SMV is most suitable for read-dominated workloads, for which
it achieves much better performance than previous solutions. It has
an up to ×11 throughput improvement over a single-version STM
and more than a two-fold improvement over an STM keeping a
constant number of versions per object. We show that unlike STMs
keeping a constant number of versions, SMV operates successfully
even in systems with limited memory.
1. Introduction
Transactional memory [16, 28] is an increasingly popular paradigm
for concurrent computing in multi-core architectures. Most transactional
memory implementations today are software toolkits, or
STMs for short. STMs speculatively allow multiple transactions to
proceed concurrently, which leads to aborting transactions in some
cases. As system load increases, aborts may become frequent, especially
in the presence of long-running transactions, and may have
a devastating effect on performance [2]. Reducing the number of
aborts is thus an important challenge for STMs.
Of particular interest in this context is reducing the abort rate
of read-only transactions. Read-only transactions play a significant
role in various types of applications, including linearizable data
structures with a strong prevalence of read-only operations [17], or
client-server applications where an STM infrastructure replaces a
traditional DBMS approach (e.g., FenixEDU web application [9]).
Particularly long read-only transactions are employed for taking
consistent snapshots of dynamically updated systems, which are
then used for checkpointing, process replication, monitoring program
execution, gathering system statistics, etc.
Unfortunately, long read-only transactions in current leading
STMs tend to be repeatedly aborted for arbitrarily long periods of
time. As we show below, the time for completing such a transaction
varies significantly under contention, to the point that some readonly
transactions simply cannot be executed without “stopping the
world”. As mentioned by Cliff Click 1 , this kind of instability is
one of the primary practical disadvantages of STM; Click mentions
multi-versioning [4] (i.e., keeping multiple versions per object), as
a promising way to make program performance more predictable.
Indeed, by keeping multiple versions it is possible to assure that
each read-only transaction successfully commits by reading a con-
∗ A preliminary version of this paper appeared in TRANSACT’10.
† This work was partially supported by the Israeli Ministry of Science
Knowledge Center on CMP, and by the Hasso Plattner Institute.
1 http://www.azulsystems.com/blog/cliff-click/
2008-05-27-clojure-stms-vs-locks
sistent snapshot [3] of the objects it accesses, e.g., values that reflect
updates by transactions that committed before it began and
no partial updates of concurrent transactions. This way, multiple
versions have the potential to improve the performance of singleversioned
STMs [11–13, 18], which, as we show below, might
waste as much as 80% of their time because of aborts in benchmarks
with many read-only transactions.
Nevertheless, previously suggested multi-versioned STMs did
not fully realize this potential. As we show in this paper, this happened
mainly because of an inefficient management of old object
versions (see Section 2). Instead of keeping a variable number of
versions based on demand, multi-versioned STMs existing today
keep a constant number of versions for each object [7, 25, 26].
As we show below, this approach does not provide enough of a
performance benefit for read-only transactions, and, even worse, it
causes severe memory problems in long executions. The challenge
is, therefore, to devise an approach for efficient management of old
object versions.
In Section 3, we present Selective Multi-Versioning (SMV), a
novel STM algorithm that addresses this challenge. SMV keeps old
object versions that are still useful to potential readers, and removes
ones that are obsolete. This way, read-only transactions can always
complete – they neither block nor abort – while for infrequentlyupdated
objects only one version is kept most of the time.
SMV achieves this while allowing read-only transactions to remain
invisible [12], i.e., having no effect on shared memory. At first
glance, combining invisible reads with effective garbage collection
may seem impossible — if read-only transactions are invisible, then
other transactions have no way of telling whether potential readers
of an old version still exist! To circumvent this apparent paradox,
we use the automatic GC facility available in managed memory
systems. Such systems use dedicated GC threads, which have access
to all the threads’ private memories, so that even operations
that are invisible to other transactions are visible to the garbage
collector. SMV ensures that old object versions become garbage
collectible (GCable) once there are no transactions that can safely
read them.
In Section 4 we evaluate different aspects of SMV’s performance.
We implement SMV in Java 2 and study its behavior using
STMBench7 [15] as well as a Java port of Vacation [8]. We
compare SMV to a TL2-style algorithm — a single-versioned STM
whose basic operation is like that of TL2 [11]; and to a k-versioned
variant of the same algorithm, which keeps k previous versions per
object, similarly to LSA [25].
We find that SMV is extremely efficient for read-dominated
workloads with long-running transactions. For example, in STM-
Bench7 with 64 threads, the throughput of SMV is eleven times
higher than that of a TL2-style algorithm and more than double than
those of 2- and 4-versioned STMs. This is thanks to the much lower
2 The code is publicly available in http://tx.technion.ac.il/
~dima39/publications.htmlamount of wasted work with SMV: 4% of SMV’s time is spent
running aborted transactions, compared to 80% and 30% with the
other algorithms. Furthermore, in an application with one thread
constantly taking snapshots and the others running update transactions,
neither TL2 nor the k-versioned STM succeeds in taking
a snapshot, even when only one concurrent updater is running. In
contrast, the performance of SMV remains stable for any number
of concurrent updaters.
We compare the memory demands of the algorithms by limiting
Java heap size. We show that most k-versioned STMs crash with a
Java OutOfMemoryException, while SMV continues to run, and
its throughput is degraded by less than 25% even under stringent
memory constraints. We explain the collapse of k-versioned STMs
in Section 5, via a somewhat surprising observation that the memory
consumption of algorithms keeping k versions per object might
grow exponentially with the number of objects.
In summary, we present the first STM that can successfully execute
long read-only transactions concurrently with frequent updates.
We do so by introducing a new approach for maintaining
multiple versions. Our conclusions appear in Section 6.
2. Related Work
As noted above, most existing STMs are single-versioned. Of these,
SMV is most closely related to TL2 [11], from which we borrow
the ideas of invisible reads, commit-time locking of updated objects,
and a global version clock for consistency checking. In a way,
SMV can be seen as a multi-versioned extension of TL2.
Among multi-versioned STMs, the closest to SMV is LSA [25].
LSA, as well as its snapshot-isolation variation [26], implements a
simple solution to garbage collection: it keeps a constant number
of versions for each object. However, this approach leads to storing
versions that are too old to be of use to any transaction on the one
hand, and to aborting transactions because they need older versions
than the ones stored on the other. In contrast, SMV keeps versions
as long as they might be useful for ongoing transactions, and makes
them GCable by an automatic garbage collector as soon as they are
not. For infrequently updated objects, SMV typically keeps a single
version.
Other previous suggestions for multi-versioned STMs [2, 5, 20,
22, 23] were based on cycle detection in the conflict graph, a data
structure representing all data dependencies among transactions.
Cycle detection incurs a high cost (quadratic in the number of
transactions), which is clearly not practical. Moreover, it requires
reads to be visible in order to detect future conflicts, which can
be detrimental to performance. Earlier work [20, 23] specified GC
rules based on precedence information as to when old versions can
be removed. However, these algorithms were too complex to be
amenable to practical implementation, and did not specify when
these GC rules ought to be checked. Aydonat and Abdelrahman [2]
propose to keep each version for as long as transactions that were
active at the time the version was created exist, but the authors
do not specify how this rule can be implemented efficiently. Other
theoretical suggestions for multi-versioned STMs ignored the issue
of GC altogether [22]. In contrast, in this paper we present a
simple algorithm, which implements invisible reads, and exploits
the automatic GC available in languages with managed memory.
The idea of keeping information as long is it might be needed
by on-going transactions was recently used by Bronson et al. [6]
for implementing concurrent collections. However, that paper deals
with specific data structures rather than general purpose STMs.
Instead of multi-versioning, STMs can avoid aborts by reading
uncommitted values and then having the reader block until the
writer commits [24], or by using read-write locks to block in case
of concurrency [1, 10]. These approaches differ from SMV, where
transactions never block and may always progress independently.
Moreover, reads, which are invisible in SMV, must be visible in
these “blocking” approaches. In addition, reading the values of
uncommitted transactions might lead to cascading aborts.
3. Selective Multi-Versioning Algorithm
We present Selective Multi-Versioning, a new object-based STM
algorithm. Section 3.1 presents the principles underlying SMV’s
design. The data structures used by SMV are described in Section
3.2. Section 3.3 presents the basic idea of the algorithm, and
Section 3.4 discusses some practical optimizations.
3.1 Design Principles
SMV’s main goal is to reduce aborts in workloads with readonly
transactions, without introducing high space or computational
overheads. SMV is based on the following design choices:
Invisible reads. Read operations can only modify the reading
transaction’s private memory, and do not affect global memory.
Invisible reads have been argued to be important for performance,
especially in multi-core systems, where updates to global memory
cause caches to thrash [12, 25].
Multi-versioning for reads. Read-only transactions always commit.
We achieve this by allowing read-only transaction Ti to observe
a consistent snapshot corresponding to Ti’s start time —
when Ti reads object oj, it finds the latest version of oj that has
been written before Ti’s start.
Managed garbage collection based on real-time order. Old object
versions are removed once there are no longer live read-only
transactions that can consistently read them. To achieve this with
invisible reads, SMV relies on the omniscient garbage collection
mechanism available in managed memory systems. Thus, SMV
needs only ensure that unneeded data is GCable, i.e., once the version
of the object cannot be read by any transaction, this version is
not strongly referenced by any live memory object.
Global version clock. Like TL2 [11] and LSA [25], SMV uses
a global version clock to detect conflicts. Each transaction reads
the clock when it begins, and update transactions increment the
clock upon commit. Each object is tagged with the version clock of
the transaction that wrote it. Though the global version clock is a
contention-point, many practical optimizations were introduced to
reduce the overhead associated with it [11, 27]. Such optimizations
are orthogonal to our work, and are therefore beyond the scope of
this paper. Perelman et al. [23] show that such global contention
is essential for any multi-versioned STM that allows all read-only
transactions to commit (more formally, such an STM cannot be
disjoint access parallel [19]).
3.2 Overview of Data Structures
As in other object-based STMs, objects are accessed via object handles.
An object handle includes a list of pointers to memory locations
of data versions, each keeping a versioned lock [11], which
holds the global version number associated with the transaction
that wrote the data and a lock bit. In order to facilitate automatic
garbage collection, object handles in SMV keep strong references
only to the latest (current) versions of each object, and use weak
references [14] to point to other versions.
Each transaction is associated with a transactional descriptor,
which holds the relevant transactional data, including a read-set, a
write-set, status, etc. In addition, transactional descriptors play an
important role in keeping strong references to old object versions,
as we explain below.
Global version numbers are generated using a global version
clock. However, unlike previous implementations of such
clocks [11, 25], SMV’s version clock consists of a list of transactional
descriptors, rather than a scalar variable. Transactional descriptors
act as “time points” organized in a one-directional linkedlist. Upon commit, an update transaction appends its transactional
descriptor to the end of the list. For example, when the current
global version is 100, a committing update transaction sets the
time point value in its transactional descriptor to 101 and adds a
pointer to this descriptor from the descriptor holding 100.
Version management is based on the idea that old object versions
are pointed to by the descriptors of transactions that overwrote
these versions. A committing transaction Tw includes in its
transactional descriptor a strong reference to the previous version
of every object in its write set before diverting the respective object
handle to the new version.
The variable curPoint points to the descriptor added by the
latest committed update transaction. When a read-only transaction
begins, it keeps (in its local variable startTP) a pointer to the
current time point. This pointer is cleared upon commit, making
old transactional descriptors at the head of the list GCable.
This way, active read-only transaction Tr keeps a reference
chain to version o j
i if this version was over-written after Tr’s start,
thus preventing o j
i ’s garbage collection. Once there are no active
read-only transactions that started before o j
i was over-written, this
version stops being referenced and thus becomes GCable .
Figure 1 illustrates the commit of an update transaction Tw that
writes to object o1. In this example, Tw and a read-only transaction
Tr both start at time 9, and hence Tr references the transactional
descriptor of time point 9. The previous update of o1 was associated
with version 5. When Tw commits, it inserts its transactional
descriptor at the end of the time points list with value 10. Tw’s descriptor
references the previous value of o1, and o1’s object handle
points to the new data, ensuring it will not be GCed as long as Tr is
active. Assume Tr then wants to read object o1. Tr determines that
it cannot read the latest version because its version (10) is greater
than Tr’s start time. Hence, Tr reads the previous object version,
whose version (5) is earlier than its start time (10).
o 1
ver = 5
data 5
curPoint
Tr dsc
start = 9
Tx dsc
time point 9
Tw dsc
o n
T w updates o 1
and commits
data 10
o 1
ver = 10
curPoint
Tr dsc
start = 9
data 5
Tx dsc
time point 9
o n
Tw dsc
time point 10
Figure 1. Transactional descriptor of Tw with time point 10 references
the over-written version of o1.
3.3 Basic Algorithm
We now describe the SMV algorithm. For the sake of simplicity, we
present the algorithm in this section using a global lock for treating
concurrency on commit. In Section 3.4 we will remove this lock
and show how to make commit operations lock-free.
SMV handles read-only and update transactions differently. We
assume that transaction’s type can be provided to the algorithm
beforehand by a compiler or via special program annotations. If
not, each transaction can be started as read-only and then restarted
as update upon the first occurrence of a write operation.
Handling update transactions.
The protocol for update transaction
Ti is depicted in Algorithm 1. The general idea is similar to
the one used in TL2 [11] and LSA [25]. An update transaction Ti
aborts if some object oj read by Ti is over-written after Ti begins
and before Ti commits. Conflicts are detected using the global time
points list. Upon starting, Ti saves the value of the latest time point
in a local variable startTime, which holds the latest time at which
an object in Ti’s read-set is allowed to be over-written.
A read operation of object oj first reads the latest version of
oj via the object handle, and then checks whether this version is
valid for Ti (function validateRead, lines 30–32). The validation
procedure checks that oj is not locked and that its version is not
greater than Ti.startTime. If the validation fails, the transaction is
aborted.
A write operation (lines 10–13) is invisible to other transactions,
as it postpones the actual work until the time of commit. Write
creates a copy of the object’s latest version for local updates, and
adds it to Ti’s local write set.
Algorithm 1 SMV algorithm for update transaction Ti.
1: Upon Startup:
2: Ti.startTime ← curPoint.commitTime
3: Read oj:
4: if (oj ∈ Ti.writeSet) then return Ti.writeSet.get(oj)
5: data ← oj.latest
6: if ¬validateRead(oj) then abort
7: readSet.put(oj)
8: return data
9: Write to oj:
10: if (oj ∈ Ti.writeSet) then update Ti.writeSet.get(oj); return
11: localCopy ← oj.latest.clone()
12: writeSet.put(〈oj, localCopy〉)
13: update localCopy
14: Commit:
15: foreach oj ∈ Ti.writeSet do: oj.lock()
16: if ¬validateReadSet() then abort
⊲ txn dsc should reference the over-written data
17:
18:
foreach oj ∈ Ti.writeSet do:
Ti.prevVersions.put(〈oj, oj.latest〉)
19: timeLock.lock()
20: Ti.commitTime ← curPoint.commitTime + 1
⊲ update and unlock the objects
21: foreach 〈oj, data〉 ∈ Ti.writeSet do:
22: oj.version ← Ti.commitTime
23: oj.weak references ← oj.weak references ∪ oj.latest
24: oj.latest ← data
25: oj.unlock()
26: curPoint.next ← Ti
27: curPoint ← Ti
28: timeLock.unlock()
29: Function validateRead(Object oj)
30: if (oj.isLocked()) then return false
31: if (oj.version > Ti.startTime) then return false ⊲ oj has been
over-written
32: return true
33: Function validateReadSet() ⊲ verify that none of the objects in the
read-set has been over-written after being read by Ti
34: foreach oj ∈ Ti.readSet do:
35: if (oj.isLocked() ∨ oj.version > Ti.startTime) then return false
36: return true
Commit (lines 15–28) consists of the following steps:
1. Lock the objects in the write set (line 15). Deadlocks can be
detected using standard mechanisms (e.g., timeouts or Dreadlocks
[21]), or may be avoided if acquired in the same order by
every transaction.
2. Validate the read set (function validateReadSet, lines 34–36).
3. Insert strong references to the over-written versions to Ti’s
descriptor’s prevVersion structure (line 18). In this way the
algorithm guarantees that the over-written versions stay in thememory as long as Ti’s descriptor is referenced by some readonly
transaction.
4. Lock the time points list (line 19). Recall that this is a simplification;
we show in Section 3.4 how we avoid such locking.
5. Set the commit time of Ti to one plus the value of the commit
time of the descriptor referenced by curPoint.
6. Update and unlock the objects in the write set (lines 21–25).
Set their new version numbers to the value of Ti.commitTime.
Keep weak references to old versions.
7. Insert Ti’s descriptor to the end of the time points list and
unlock the list (lines 26–28).
Algorithm 2 SMV algorithm for read-only transaction Ti.
1: Upon Startup:
2: Ti.startTP ← curPoint
3: Read oj:
4: latestData ← oj.latest
5: if (oj.version ≤ Ti.startTP.commitTime) then return latestData
6: return the latest version ver in oj.weak references, s.t.
7: ver.version ≤ Ti.startTP.commitTime
8: Commit:
9: Ti.startTP ← ⊥
Handling read-only transactions. The pseudo-code for readonly
transactions appears in Algorithm 2. Such transactions always
commit without waiting for other transactions to invoke any operations.
The general idea is to construct a consistent snapshot based
on the start time of Ti. At startup, Ti.startTP points to the latest
installed transactional descriptor (line 2); we refer to the time value
of startTP as Ti’s start time.
For each object oj, Ti reads the latest version of oj written
before Ti’s start time. When Ti reads an object oj whose latest
version is greater than its start time, it continues to read older
versions until it finds one with a version number older than its
start time. Some old enough version is guaranteed to be found,
because the updating transaction Tw that over-wrote oj has added
Tw’s descriptor referencing the over-written version somewhere
after Ti’s starting point, preventing GC.
The commit procedure for read-only transactions merely removes
the pointer to the starting time point, in order to make it
GCable, and always commits.
3.4 Allowing Concurrent Access to the Time Points List
We show now how to avoid locking the time points list (lines 19, 28),
so that update transactions with disjoint write-sets may commit
concurrently. We first explain the reason for using the lock. In order
to update the objects in the write-set, the updating transaction
has to know the new version number to use. Choosing the version
number and adding the descriptor to the list must be part of the
same atomic action, so that two different transactions do not get
the same number.
But if a transaction exposes its descriptor before it finishes updating
the write-set, then some read-only transaction might observe
an inconsistent state, as exemplified by the scenario depicted in
Figure 2(a). Consider transaction Tw that updates objects o1 and
o2. The value of curPoint at the beginning of Tw’s commit is 9.
Assume Tw first inserts its descriptor with value 10 to the list,
then updates object o1 and pauses. At this point, o1.version = 10,
o2.version < 10 and curPoint → commitTime = 10. A new readonly
transaction starts with time 10 and successfully reads the new
value of o1 and the old value of o2, because they are both less than
or equal to 10, thus obtaining an inconsistent snapshot. Intuitively,
the problem is that the new time point becomes available to the
readers as a potential starting time before all the objects of the committing
transaction are updated.
data 10
curPoint
o 1 o 2
ver = 10
data 5
Tx dsc
time point 9
Tr dsc
start = 10
ver = 5
data 5
Tw dsc
time point 10
(a) Tr’s start time is 10, which causes
inconsistent snapshot.
curPoint
o 1 o 2
ver = 10
data 10
readyPoint
data 5
Tx dsc
time point 9
ready = true
Tr dsc
start = 9
ver = 5
data 5
Tw dsc
time point 10
ready = false
(b) Tr’s start time is 9 according to
the latest ready time point.
Figure 2. Transaction Tw updates o1 and o2 with version 10. Without a
proper synchronization, transaction Tr might read a new version of o1 and
an old version of o2.
To preserve consistency without locking the time points list, we
add an additional boolean field ready to the descriptor’s structure,
which becomes true only after the committing transaction finishes
updating all objects in its write-set. In addition to the global cur-
Point variable referencing the latest time point, we keep a global
readyPoint variable, which references the latest time point in the
ready prefix of the list. When a new read-only transaction starts, its
startTP variable references readyPoint (instead of curPoint as in the
original algorithm). In Figure 2(b), a read-only transaction Tr has a
start time of 9, because the new time point with value 10 is still not
ready. Hence, Tr does not return the new value of o1. Generally,
the use of readyPoint guarantees that if a read-only transaction Tr
reads an object version written by Tw, then Tw and all its preceding
transactions had finished writing their write-sets.
Note, however, that when using ready points we should make
sure we do not violate the real time order. That is, we should prevent
a situation in which a transaction Tr that starts after Tw terminates
has a start time value smaller than Tw’s commit time, because in
this case, Tr might not see the values written by Tw.
This situation might arise if update transactions become ready
in an order that differs from their time points order, thus leaving
an unready transaction between ready ones in the list. To enforce
real-time order, a new read-only transaction first notes the time
point of the latest ready transaction. If readyPoint does not point
to this transaction (because of a “gap” in ready transactions as
explained above), it waits for the ready point to reach this point
before starting.
4. Implementation and Evaluation
4.1 Compared Algorithms
Our evaluation aims to check the specific novel aspect introduced
by SMV, that is to say, keeping and garbage collecting multiple
versions. We compare SMV to the closest well-known algorithm,
namely TL2 and a multiple-versioned variant thereof. We implement
the following algorithms:
SMV – The algorithm described in Section 3.
TL2-style – A single-versioned STM mimicking the basic operation
of TL2 [11] with a single central global version clock.
k-versioned – an STM based on a TL2-style’s logic and code,
in which each object keeps a constant k, number of versions
(this approach resembles LSA [25]). Reads operate as in SMV,
except that if no adequate version is found, the transactionTransactions/sec
35000
SMV
30000 8-ver
2-ver
25000
TL2
20000
15000
10000
5000
0
1 1 1 1 1
Threads
(a) Throughput in Vacation benchmark.
Transactions/sec
1200
1000
800
600
400 SMV
8-ver
200 2-ver
TL2
0
1 4 16 32 64
Threads
(b) Throughput in STMBench7’s write-dominated workload.
Figure 3. In the absence of read-only transactions multi-versioning cannot be exploited. The overhead of SMV degrades throughput by up to 20%.
3500
3000
SMV
8-ver
1600
1400
SMV
8-ver
Transactions/sec
2500
2000
1500
2-ver
TL2
Transactions/sec
1200
1000
800
600
2-ver
TL2
1000
400
500
200
0
1 4 16 32 64
Threads
(a) Throughput in STMBench7’s read-dominated workload.
0
1 4 16 32 64
Threads
(b) Throughput in STMBench7’s read-write workload.
Figure 4. By reducing aborts of read-only transactions, SMV presents a substantially higher throughput than TL2 and the k-versioned STM. In readdominated
workloads, its throughput is ×11 higher than that of TL2 and more than twice those of the k-versioned STM with k = 2 or k = 8. In read-write
workloads its advantage decreases because of update transactions, but SMV still clearly outperforms its competitors.
aborts. Update transactions have a behavior similar to that of
TL2.
We use trivial contention management for all the algorithms, where
an aborting transaction restarts immediately.
It is worth noting that our implementation of the TL2-style algorithm
does not use all the software optimizations used in the original
one. Our aim is to create for all the algorithms as similar a starting
point as possible. This way, the tests evaluate the algorithmic
differences, while minimizing the influence of the engineering optimizations
and tweaks. These may certainly be applicable to each
of the compared alternatives.
4.2 Experiment Setup
The algorithms are implemented in Java. In order to evaluate their
performance, we use two types of benchmarks: 1) the Java version
of STMBench7 [15]; and 2) Vacation, which is part of the
STAMP [8] benchmark suite.
STMBench7. STMBench7 aims to simulate different behaviors
of real-world programs by invoking both read-only and update
transactions of different lengths over large data structures, typically
graphs. It supports various workload types, which differ in their ratio
of read-only to update transactions: 90/10 for read-dominated
workloads, 60/40 for read-write workloads, and 10/90 for writedominated
workloads. Operation types for both read-only and update
transactions include graph traversals of different lengths, structural
modifications, and single access operations. When running
STMBench7 workloads, we bound the length of each benchmark
by the number of transactions performed by each thread (2000
transactions per thread unless stated otherwise). We manually disabled
long update traversals because they inherently eliminate any
potential for scalability.
Vacation (Java port). Vacation emulates a travel reservation system,
which is implemented as a set of trees. It supports changing
the size of the database, length of the benchmark, and other parameters
that give us control over the level of contention among threads.
Unlike STMBench7, all transactions in Vacation are update ones.
In our experiments, we use the following parameters: 8 queries per
task, 90% of relations queried, 65536 possible relations, 80% user
tasks and 524288 tasks for each thread.
In both benchmarks, we defined transactional objects at the
granularity of graph/data structure nodes. This provides a reasonable
compromise between the cost of copy-on-write and the
overhead of algorithmic bookkeeping. To support this, we reimplemented
collections based on java.util.
We ported both benchmarks to a common transactional interface,
in order to facilitate easy plug-in into both frameworks. Together
with the common code platform used in all our implemen-Wasted work (memory accesses)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
SMV
8-ver
2-ver
TL2
Wasted work (memory accesses)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
SMV
8-ver
2-ver
TL2
0.1
0.1
0
1 4 16 32 64
Threads
(a) Wasted memory accesses in STMBench7’s read-dominated workload.
0
1 4 16 32 64
Threads
(b) Wasted memory accesses in STMBench7’s read-write workload.
Figure 5. Ratio of memory accesses in aborted transactions. In read-dominated workloads the use of selective multi-versioning succeeds to reduce the
amount of wasted work by a factor of 3, outperforming k-versioned STMs by more than 75%. In read-write workloads all multi-versioned solutions behave
similarly and induce 10% less wasted work than TL2.
Wasted time
1
0.9
0.8
0.7
0.6
0.5
0.4
TL2
2-ver
8-ver
SMV
Wasted time
1
0.9
0.8
0.7
0.6
0.5
0.4
TL2
2-ver
8-ver
SMV
0.3
0.3
0.2
0.2
0.1
0.1
0
1 4 16 32
Threads
0
1 4 16 32
Threads
(a) Time spent in aborted transactions in STMBench7’s read-dominated
workload.
(b) Time spent in aborted transactions in STMBench7’s read-write workload.
Figure 6. The share of time spent in aborted transactions for read-dominated and read-write workloads. In read-dominated workloads, SMV wastes less than
4% of time, while TL2 might waste more than 80% of the time inside aborted transactions. In read-write workloads, all STMs lose more time because of the
larger share of update transactions. SMV is still better than TL2 and the k-versioned STMs by more than 20%.
tations, this approach further ensures that all the algorithms have a
similar starting point.
The benchmarks are run on a dedicated shared-memory NUMA
server with 8 Quad Core AMD 2.3GHz processors and 16GB
of memory attached to each processor. The system runs Linux
2.6.22.5-31 with swap turned off. For all tests but those with limited
memory, JVM’s maximum heap size is set to 32GB. Thread
scheduling is left entirely to the OS. We run up to 64 threads on the
32 cores.
Our evaluation study is organized as follows: in Section 4.3,
we show system performance measurements. Section 4.4 considers
the latency and predictability of long read-only operations. In
Section 4.5, we examine how many versions should be used, and in
Section 4.6, we analyze the memory demands of the algorithms.
4.3 Performance Measurements
SMV overhead. As we mentioned earlier, the use of multiple versions
in our algorithm can be exploited by read-only transactions
only. However, before evaluating the performance of SMV with
read-only transactions, we first want to understand its behavior in
programs with update transactions only. In these programs, SMV
can hardly be expected to outperform its single-versioned counterparts.
For update transactions, SMV resembles the behavior of TL2,
with the additional overhead of maintaining a list of transactional
descriptors instead of a single global clock. Measuring throughput
in programs without read-only transactions can help us quantify the
cost of this additional overhead.
In Figure 3, we show throughput measurements of the algorithms
for the Vacation benchmark and for the write-dominated
STMBench7 workload. Vacation, which is shown in Figure 3(a),
does not contain read-only transactions at all. The write-dominated
STMBench7 workload shown in Figure 3(b) runs 90% of its operations
as update transactions, and therefore the influence of readonly
ones is negligible.
Both Vacation and STMBench7 show similar behavior for all of
the compared algorithms. This emphasizes the fact that the algorithms
take the same approach when executing update transactions
and that they all have a common underlying code platform. We
can see that the added overhead of SMV causes a 20% throughput
drop, which is the cost we pay for maintaining multiple versions
when these versions are not actually used.
Throughput. We next run workloads that include read-only transactions,
in order to assess whether the overhead of SMV is offset
by its smaller abort rate. In Figure 4 we depict throughput mea-surements of the algorithms in STMBench7’s read-dominated and
read-write workloads. We see that in the read-dominated workload,
SMV’s throughput is eleven times higher than that of TL2. Despite
keeping as many as 8 versions, the k-versioned STM cannot keep
up, and SMV outperforms it by a factor of 2.5. We further note
that SMV is scalable, and its advantage over a single-version STM
becomes more pronounced as the number of threads rises. In the
read-write workload, the number of read-only transactions that can
use multiple versions decreases, and the throughput gain becomes
58% over TL2 and 20% over the 8-versioned STM.
We conclude that in the presence of read-only transactions the
benefit of SMV significantly outweighs its overhead. To explain
this benefit, we next look at the amount of work wasted by the
STMs due to aborts.
Amount of wasted work. As explained above, keeping multiple
versions can potentially improve performance by decreasing the
abort rate. However, looking at the abort rate is not enough: there
is a substantial difference between a transaction that is aborted at
the very beginning and a transaction aborted after running for a
long time. Hence, what we want to measure now is the “amount of
wasted work” done by transactions before they abort.
There is no strict way for defining wasted work. We therefore
analyze two parameters: 1) number of accesses to transactional objects
during aborted transactions; and 2) time spent by the program
inside aborted transactions.
In Figure 5 we show the number of memory accesses performed
by aborted transactions. Wasted memory accesses are undesirable
mainly because of their disruptive influence on cache performance
and their negative role in power consumption. In addition, they give
a good intuition for the overall efficiency of the run.
We see that in the read-dominated workload (Figure 5(a)), with
64 threads, more than 70% of the memory accesses in TL2 occur
in aborted transactions. This occurs not only because of a high
abort rate, but also because the probability for a read operation
by a long read-only transaction to require an old version increases
over time. In SMV, wasted accesses amount to roughly 25% —
a result that cannot be achieved by the k-versioned STMs either.
Somewhat surprisingly, even 8 versions do not suffice, and circa
45% of the memory accesses are wasted. We will show why this
occurs in Section 4.5.
In read-write workloads most of the wasted accesses occur due
to aborts of update transactions, hence the differences between the
algorithms are less significant — all the multi-versioned STMs
show 10% less wasted accesses than TL2.
In Figure 6, we show the amount of time wasted on eventually
aborted transactions. This approach approximates net CPU utilization
and hence explains throughput results. We note that this approximation
works well only if the number of threads is less than
or equal to the number of available cores (otherwise, time measurements
also count intervals in which the threads are suspended).
Here, the benefit of SMV is even more pronounced. We see that
in the read-dominated workload, TL2 spends more than 80% of its
time running aborted transactions! Interestingly, k-versioned STMs
cannot fully alleviate this effect either, succeeding to reduce the
amount of wasted time to 36% only. In contrast, SMV’s wastage
does not rise above 3%. In the read-write workload, the differences
between the algorithms become less obvious, but the same
tendency remains.
4.4 Latency and Predictability of Long Read-Only
Operations
In the previous section we concentrated on overall system performance
without considering specific transactions. However, in reallife
applications the completion time of individual operations is important
as well. In this section we consider two examples: taking
system snapshots of a running application and STMBench7’s long
traversals.
Max time to complete operation (sec)
400
350
300
250
200
150
100
50
0
SMV
8-ver
2-ver
TL2
1 4 16 32 64
Threads
Figure 7. Maximum time for completing a long read-only operation in
STMBench7. This time is hardly predictable in TL2 and k-versioned STMs:
it takes hundreds of seconds under high loads. SMV completes each readonly
operation in several seconds regardless of system contention.
Taking a full-system snapshot is important in various fields:
it is used in client-server finance applications to provide clients
with consistent views of the state, for checkpointing in highperformance
computing, for creating new replicas, for application
monitoring and gathering statistics, etc. As for any other operation,
predictability of the time it takes to complete the snapshot is
important, both for program stability and for usability.
We first show the maximum time for completing a long readonly
traversal, which is already built-in in STMBench7 (see Figure
7). As we can see in the graph, this operation takes only several
seconds when run without contention. However, when the number
of threads increases, completing the traversal might take more
than 100 seconds in TL2 and k-versioned STMs. Unlike those algorithms,
SMV is less impacted by the level of contention and it
always succeeds to complete the traversal in several seconds.
Number of threads
1 4 8 16 32
TL2 — — — — —
SMV 1445.8 1354.8 1293.4 1429.8 1537
2-versioned — — — — —
4-versioned — — — — —
8-versioned — — — — —
Table 1. Maximum time to take a snapshot (msec) in Vacation benchmark.
Even when there is only a single application thread, neither TL2, nor the k-
versioned STM can successfully take a snapshot even once. SMV presents
stable performance unaffected by the level of contention in the system.
Next, we added the option of taking a system snapshot in Vacation.
In addition to the original application threads, we run a special
thread that repeatedly tries to take a snapshot. We are interested
in the maximum time it takes to complete the snapshot operation.
The results appear in Table 1. We see that neither TL2 nor the k-
versioned STM can successfully take a snapshot even when only
a single application thread runs updates in parallel with the snapshot
operation. Surprisingly, even 8 versions do not suffice to allow
snapshots to complete, this is because within the one and a half seconds
it takes the snapshot to complete some objects are overwritten
more than 8 times.
On the other hand, the performance of SMV remains stable and
unaffected by the number of application threads in the system.
We conclude that SMV successfully keeps the needed versions.
In Section 4.6, we show that it does so with smaller memory
requirements than the k-versioned STM.4.5 How Many Versions to Keep
In the previous sections we have shown the benefits of keeping
multiple versions for system throughput and operation latency in
benchmarks with many read-only transactions. We now consider
the following questions: how many versions do we need to keep?
What is the improvement we get with different numbers of versions?
Histogram of version accesses. In Figure 8 we show a histogram
of previous version accesses in STMBench7’s read-dominated and
read-write workloads. At any point in time, we number object
versions in ascending order, starting from the latest one (the latest
version is number 1, the one before it is number 2, etc.). For each
number k, we count the percentage of accesses to version k out of
the total number of memory accesses.
We see from the graph that the percentage of accesses to an arbitrary
old object version is extremely small – it is less than 2.5%
even for version 2. This observation intuitively suggests that keeping
a constant number of versions per object is wasteful, because
for most of the objects, a single version is enough most of the time.
But in the previous sections, we saw that even 8 versions do not
suffice to allow long read-only transactions to complete, and even
with this many versions, aborts lead to a substantial percentage
of wasted work. To explain this apparent contradiction, we now
examine how the number of versions can impact wasted work.
Wasted work lower bound.
We now conduct the following experiment:
while running SMV, for each version number k, we keep the
number of memory accesses executed by transactions before they
first access version k (if a transaction never accesses version k, the
number is zero). In an STM that keeps k−1 versions per object, the
transactions would then abort, and these memory accesses would
need to be redone at least once. Thus, we obtain a lower bound on
wasted memory accesses in a k-versioned STM.
Figure 9 shows the number of memory accesses done before
first accessing some k th object version. Surprisingly, this amount
is relatively high: it starts from 60% for the second version and
does not fall below 10% even for the 9 th version. According to the
histogram in Figure 8, the likelihood of accessing these versions is
extremely low. So how can we get such a high number of memory
accesses? The insight is that if a transaction accesses the k th
version of an object, it means that this object has been updated at
least k − 1 times since this transaction began. This usually happens
if a transaction has been already running for a long period of time
and has already accessed a large number of transactional objects.
The older the version, the less it is accessed, and thus the higher
the amount of work executed before the access.
The direct conclusion from Figure 9 is that keeping previous
versions is important despite the low frequency of accessing them.
Furthermore, the “heavy tail” in the graphs indicates that keeping a
constant number of versions per object will typically not be enough
for reducing the amount of wasted work. This observation is in line
with the results in the previous sections.
4.6 Memory Demands
One of the potential issues with multi-versioned STMs is their
high memory consumption. In this section we compare memory
demands of the different algorithms. To this end, we execute longrunning
write-dominated STMBench7 benchmarks (64 threads,
each thread running 40000 operations) with different limitations
on the Java memory heap. Such runs present a challenge for the
multi-versioned STMs because of their high update rate and limited
memory resources. As we recall from Section 4.3, multi-versioned
STMs cannot outperform TL2 in a write-dominated workload.
Hence, the goal of the current experiment is to study the impact
of the limited memory availability on the algorithms’ behaviors.
Memory limit
2GB 4GB 8GB 12GB 16GB
TL2 606.89 631.56 630.3 674.96 647.17
SMV 450.12 543.04 563.74 595.78 602.01
2-versioned — 515.32 532.7 550.61 533.01
4-versioned — — — — 281.98
8-versioned — — — — —
Table 2. Throughput (txn/sec) in limited memory systems: k-versioned
STMs do not succeed to complete the benchmark, while SMV performs
well even with a 2GB memory limitation.
Table 2 shows how the algorithms’ throughput depends on the
Java heap size. A “—” sign corresponds to runs in which the algorithm
did not succeed to complete the benchmark due to a Java
OutOfMemoryException. Notice that the 8-versioned STM is unable
to successfully complete a run even given a 16GB Java heap
size. Decreasing k to 4, and then 2, makes it possible to finish the
runs under stricter constraints. However, none of the k-versioned
STMs succeed under the limitation of 2GB. Unlike k-versioned
STMs, SMV continues to function under these constraints. Furthermore,
SMV’s throughput does not change drastically — there is a
maximum decrease of 25% in throughput when Java heap size is
shrinked 8-fold.
We explain the collapse of the k-versioned STM in Section 5,
where we illustrate that its memory consumption can become exponential
rather than linear in the number of transactional objects.
Memory limit
2GB 4GB 8GB 12GB 16GB
TL2 0.10 0.04 0.03 0.02 0.02
SMV 0.30 0.13 0.09 0.08 0.08
2-versioned — 0.06 0.03 0.03 0.02
4-versioned — — — — 0.46
8-versioned — — — — —
Table 3. Time spent in GC: k-versioned STMs do not succeed to complete
the benchmark, SMV spends ≈ 10% of the time in GC when the memory
limitation is above 2GB.
In Table 3 we show the relative amount of time spent garbage
collecting unreferenced data (we use a standard throughput “stop
the world” garbage collector, hence, the application threads do not
run during this time). As we mentioned above, k-versioned STMs
were simply unable to complete the benchmark. As expected, the
GC share when running SMV is clearly higher than that of TL2
— 8% versus 2% in the less constrained runs. Under the stringent
constraints, the GC takes a significant share of the time (30%).
This is the cost we pay for collecting old object versions in a
sophisticated way: transactional descriptors might form long chains
of time points that complicate the task of the garbage collector.
5. Exponential Memory Growth
The empirical results in the previous section have shown that the
k-versioned STM fails in long-running benchmarks with memory
constraints. We now explain this phenomenon.
A naïve assessment of the memory consumption of a k-versioned
STM would probably estimate that it takes up to k times as much
more memory as a single-versioned STM. However, this estimate
fails to explain the results shown in Table 2: given that a
2-versioned STM succeeds to finish the benchmark with a 4GB
heap size, we would expect the 4-versioned STM to successfully
finish the same benchmark with an 8GB or 12GB heap size. But
the 4-versioned STM fails to do that!
We now illustrate that, in fact, the memory consumption of a
k-versioned STM in runs with n transactional objects might grow
like k n . Intuitively, this happens because previous object versions0.03
0.025
4 threads
16 threads
0.012
0.01
4 threads
16 threads
% accesses
0.02
0.015
32 threads
64 threads
% accesses
0.008
0.006
32 threads
64 threads
0.01
0.004
0.005
0.002
0
2 3 4 5 6 7 8 9+
Version
0
2 3 4 5 6 7 8 9+
Version
(a) Version access histogram for STMBench7’s read-dominated workload.
(b) Version access histogram for STMBench7’s read-write workload.
Figure 8. Version access histograms for STMBench7. The 2 nd version is accessed less than 2.5% of the time, and it drops even further for older versions.
The histogram emphasizes the fact that in most cases, keeping a single version of an object is enough.
Work done before first
accessing this vesrion
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
4 threads
16 threads
32 threads
64 threads
Work done before first
accessing this version
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
4 threads
16 threads
32 threads
64 threads
0.2
0.2
0.1
0.1
0
2 3 4 5 6 7 8 9+
0
2 3 4 5 6 7 8 9+
Version
Version
(a) The share of memory accesses done by transactions before first accessing
a k th object version in STMBench7’s read-dominated workload.
(b) The share of memory accesses done by transactions before first accessing
a kth object version in STMBench7’s read-write workload.
Figure 9. The share of memory accesses done by transactions before first accessing a k th object version. Surprisingly, despite extremely low access
probabilities for version k in general (see Figure 8), the price of not keeping this version is high. For example, at least 60% of memory accesses have to be
redone at least once if an STM keeps a single object version. Furthermore, the “heavy tail” in the graphs shows that keeping any constant number of versions
per object does not prevent a high percentage of wasted work.
continue to keep references to already deleted objects, which causes
deleted objects to be pinned in memory. Consider, for example, a
2-versioned STM in the scenario depicted in Figure 10. The STM
keeps a linked list of three nodes. When removing node 30 and
inserting a new node 40 instead, node 30 is still kept as the previous
version of 20.next. Next, when node 20 is replaced with node 25,
node 30 is still pinned in memory, as it is referenced by node 20.
After several additional node replacements, we see that there is a
complete binary tree in memory, although only a linked list is used
in the application.
More generally, with a k-versioned STM, a linked list of length
n could lead to Ω(k n ) node versions being pinned in memory. This
demonstrates an inherent limitation of any algorithm that keeps a
constant number of versions for each object. SMV overcomes this
limitation by keeping old object versions only as long as they might
be needed by live read-only transactions.
6. Conclusions
Many real-world applications invoke a high rate of read-only
transactions, including ones executing long traversals or obtaining
atomic snapshots. For such workloads, multi-versioning is essential:
it bears the promise of high performance, reduced abort
rates, and less wasted work. Moreover, single-version STM performance
can become unpredictable, and long-running read-only
transactions often even fail to complete when run concurrently with
update transactions. Multi-versioning can potentially remedy these
limitations.
Nevertheless, previously suggested multi-versioned STMs did
not fully deliver on these promises. In this paper, we have illustrated
that the main reason for their limited success is due to keeping
a constant number of versions for each object. We saw that
because of the heavy-tailed nature of memory accesses, this constant
number does not suffice for long traversals to complete. On
the other hand, we illustrated that the memory consumption of this
approach may grow exponentially with the amount of transactional
data, where the number of versions per object is the base of the
exponent.
We presented Selective Multi-Versioning, a new STM that delivers
on the promises of multi-versioning. It achieves high performance
(high throughput, low and predictable latency, and little
wasted work) in the presence of long read-only transactions. Despite
keeping multiple versions, SMV can work well in memory
constrained environments.
SMV overcomes the limitations of previous multi-versioned
STMs by using a new approach to managing old object versions: it
keeps old object versions as long as they might be useful for some30
replace 30
with 40
10 20 30 10 20 40
replace 20
with 25
20
30
10 25 40
replace 40
with 50
10
30
20
40
replace 50
with 60
10
20
30
40
50
25 50
25
60
Figure 10. Example demonstrating exponential memory growth even for an STM keeping only 2 versions of each object. The linked list of transactional
causes a binary tree to be pinned in memory because previous node versions continue to keep references to already deleted nodes.
transaction to read. We do so while allowing read-only transactions
to remain invisible by relying on automatic garbage collection to
dispose of obsolete versions. More generally, we presented the idea
of keeping old data as long as it might be useful for some processes
in the future by having such potential future users of the data keep
references that prevent the data’s garbage collection. We believe
that this approach can be the key to achieving good performance
not only in STMs, but also in a range of concurrent data structures.
References
[1] H. Attiya and E. Hillel. Brief announcement: Single-Version STMs
can be Multi-Version Permissive. In Proceedings of the 29th symposium
on Principles of Distributed Computing, 2010.
[2] U. Aydonat and T. Abdelrahman. Serializability of transactions in
software transactional memory. In Second ACM SIGPLAN Workshop
on Transactional Computing, 2008.
[3] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil.
A critique of ANSI SQL isolation levels. In Proceedings of the 1995
ACM SIGMOD international conference on Management of data,
pages 1–10, 1995.
[4] P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control
and Recovery in Database Systems. Addison-Wesley, 1987.
[5] A. Bieniusa and T. Fuhrmann. Consistency in hindsight, a fully
decentralized stm algorithm. In IPDPS 2010: Proceedings of the 24th
IEEE International Parallel and Distributed Processing Symposium,
2010.
[6] N. Bronson, J. Casper, H. Chafi, and K. Olukotun. Transactional
Predication: High-Performance Concurrent Sets and Maps for STM.
In Proceedings of the 29th symposium on Principles of Distributed
Computing, 2010.
[7] J. Cachopo and A. Rito-Silva. Versioned boxes as the basis for
memory transactions. Science of Computer Programming, 63(2):172–
185, 2006.
[8] C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP:
Stanford transactional applications for multi-processing. In IISWC
’08: Proceedings of The IEEE International Symposium on Workload
Characterization, September 2008.
[9] N. Carvalho, J. Cachopo, L. Rodrigues, and A. Rito-Silva. Versioned
transactional shared memory for the FenixEDU web application. In
Proceedings of the 2nd workshop on Dependable distributed data
management, pages 15–18, 2008.
[10] D. Dice and N. Shavit. TLRW: Return of the read-write lock.
In TRANSACT ’09: 4th Workshop on Transactional Computing, feb
2009.
[11] D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Proceedings
of the 20th International Symposium on Distributed Computing,
pages 194–208, 2006.
[12] R. Ennals. Cache sensitive software transactional memory. Technical
report.
[13] K. Fraser. Practical lock freedom. PhD thesis, Cambridge University
Computer Laboratory, 2003.
[14] J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language
Specification, Third Edition. Addison-Wesley Longman, 2005.
[15] R. Guerraoui, M. Kapalka, and J. Vitek. STMBench7: A Benchmark
for Software Transactional Memory. In Proceedings of the Second
European Systems Conference, 2007.
[16] M. Herlihy and J. E. B. Moss. Transactional memory: architectural
support for lock-free data structures. SIGARCH Comput. Archit. News,
21(2):289–300, 1993.
[17] M. Herlihy and N. Shavit. The Art of Multiprocessor Programming.
Morgan Kaufmann, 2008.
[18] M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Software
transactional memory for dynamic-sized data structures. In PODC’03,
pages 92–101, 2003.
[19] A. Israeli and L. Rappoport. Disjoint-access-parallel implementations
of strong shared memory primitives. In Proceedings of the thirteenth
annual ACM symposium on Principles of distributed computing, pages
151–160, 1994.
[20] I. Keidar and D. Perelman. On avoiding spare aborts in transactional
memory. In SPAA’09, pages 59–68, 2009.
[21] E. Koskinen and M. Herlihy. Dreadlocks: efficient deadlock detection.
In Proceedings of the twentieth annual symposium on Parallelism in
algorithms and architectures, pages 297–303, 2008.
[22] J. Napper and L. Alvisi. Lock-free serializable transactions. Technical
report, The University of Texas at Austin, 2005.
[23] D. Perelman, R. Fan, and I. Keidar. On maintaining multiple versions
in transactional memory. In PODC’10.
[24] H. E. Ramadan, I. Roy, M. Herlihy, and E. Witchel. Committing
conflicting transactions in an STM. SIGPLAN Not., 44(4):163–172,
2009.
[25] T. Riegel, P. Felber, and C. Fetzer. A lazy snapshot algorithm with
eager validation. In Proceedings of the 20th International Symposium
on Distributed Computing, pages 284–298, 2006.
[26] T. Riegel, C. Fetzer, and P. Felber. Snapshot isolation for software
transactional memory. In 1st ACM SIGPLAN Workshop on Transactional
Computing (TRANSACT), 2006.
[27] T. Riegel, C. Fetzer, and P. Felber. Time-based transactional memory
with scalable time bases. In Proceedings of the nineteenth annual
ACM symposium on Parallel algorithms and architectures, pages 221–
228, 2007.
[28] N. Shavit and D. Touitou. Software transactional memory. In Proceedings
of the 12th Annual ACM Symposium on Principles of Distributed
Computing (PODC), pages 204–213, 1995.