Selective Refinement
After we have finished dicovering program loops and dependences which inhibit their parallelization, we are left with a large number of loops which could legally be parallelized. However, the parallelization process is far from complete. This is because:
- some loops are too short to be worth the paralllelization effort
- some loops require so much bandwidth to memory that parallelization will lead to performance degradation
Therefore, we use some of the profiling information collected in the discovery step to limit the scope of our parallelization to those loops which are likely to bring performance gains. This includes:
- ratio of loop execution time to total application runtime
- ratio of computation to memory accesses
- loop nesting level
Once we have finished disqualifying loops, we are ready to focus our tuning efforts on only the bery best parallelization candidates.
Selective Refinement
After we have finished dicovering program loops and dependences which inhibit their parallelization, we are left with a large number of loops which could legally be parallelized. However, the parallelization process is far from complete. This is because:
- some loops are too short to be worth the paralllelization effort
- some loops require so much bandwidth to memory that parallelization will lead to performance degradation
Therefore, we use some of the profiling information collected in the discovery step to limit the scope of our parallelization to those loops which are likely to bring performance gains. This includes:
- ratio of loop execution time to total application runtime
- ratio of computation to memory accesses
- loop nesting level
Once we have finished disqualifying loops, we are ready to focus our tuning efforts on only the bery best parallelization candidates.