Dynamic Discovery
We instrument the binary code to track 4 states of each memory item:
- No Access
- Read Only
- Write First
- Read/Write
As the program executes and reads and writes data, the state of each memory element is updated. When inner regions, such as loops, are entered, new state bits are pushed using a stack-like discipline; when the region is exited, the state of the region is summarized. In this manner, a very precise profile of how each program region accesses each data element is created.
We search the profile data for conditions which would make parallelization illegal. For example, if one iteration of a loop reads a value generated by a different iteration, this is usually a data dependence which inhibits parallelization. There are exceptions, however: induction (loop counter) and reduction (value summation) semantics are allowed. But these variable types impose strict requirements on variable updates. Induction variables must be updated precisely the same way in each loop iteration, and reduction variables must not be read unless the read is part of an update operation. Sometimes inner loops can be parallelized but not outer ones, and sometimes the opposite is true.
Dynamic Discovery
We instrument the binary code to track 4 states of each memory item:
- No Access
- Read Only
- Write First
- Read/Write
As the program executes and reads and writes data, the state of each memory element is updated. When inner regions, such as loops, are entered, new state bits are pushed using a stack-like discipline; when the region is exited, the state of the region is summarized. In this manner, a very precise profile of how each program region accesses each data element is created.
We search the profile data for conditions which would make parallelization illegal. For example, if one iteration of a loop reads a value generated by a different iteration, this is usually a data dependence which inhibits parallelization. There are exceptions, however: induction (loop counter) and reduction (value summation) semantics are allowed. But these variable types impose strict requirements on variable updates. Induction variables must be updated precisely the same way in each loop iteration, and reduction variables must not be read unless the read is part of an update operation. Sometimes inner loops can be parallelized but not outer ones, and sometimes the opposite is true.