Implementing Atomic Operations
In cache or Memory?
- cacheable
- better latency and bandwidth on self-reacquisition
- allows spinning in cache without generating traffic while waiting
- at-memory
- lower transfer time
- used to be implemented with “locked” read-write pair of bus transitions
- not viable with modern, pipelined busses
- usually traffic and latency considerations dominate, so use cacheable
- what the implementation strategy?