Bag of Tricks (contd.)
Beware conflict misses more generally
- Allocate non-power-of-2 even if application needs power-of-2
- Conflict misses across data structures: ad-hoc padding/alignment
- Conflict misses on small, seemingly harmless data
Use per-processor heaps for dynamic memory allocation
Copy data to increase locality
- If noncontiguous data are to be reused
- Must trade off against cost of copying
Pad and align arrays: can have false sharing v. fragmentation tradeoff
Organize arrays of records for spatial locality
- E.g. particles with fields: organize by particle or by field
- In vector programs by field for unit-stride, in parallel often by particle
- Phases of program may have different access patterns and needs