The newest implication of the bodily HW constraints towards the programming design are this 1 try not to directory dynamically round the resources registers: a register file can be fundamentally not listed dynamically. For the reason that the newest sign in matter is restricted and something possibly should unroll clearly to track down fixed register numbers or go as a result of thoughts. This is a regulation common so you’re able to CUDA coders: when claiming an exclusive drift a good ; and you may then indexing with a working value results in thus-called local memories incorporate (we.age. roundtripping in order to memories).
Implication on codegen ¶
That it introduces the effects towards the static versus vibrant indexing discussed in past times: extractelement , insertelement and you may shufflevector on n-D vectors in MLIR just service fixed indices. Dynamic indicator are just served on extremely small step one-D vector however new outer (n-1)-D . Some other times, explicit stream / stores are essential.
- Loops up to vector thinking are secondary addressing of vector values, they must operate on specific stream / shop procedures more n-D vector systems.
- Just after an enthusiastic n-D vector style of are stacked on the an SSA value (that may or will most likely not inhabit n files, having or as opposed to spilling, whenever at some point reduced), it may be unrolled so you can shorter k-D vector versions and operations one to correspond to the fresh new HW. That it quantity of MLIR codegen is related to register allocation and spilling one to exists far afterwards on the LLVM pipe.
- HW may help >1-D vectors with intrinsics for indirect approaching in these vectors. These could feel targeted owing to explicit vector_throw businesses regarding MLIR k-D vector versions and processes so you can LLVM escort Vacaville step one-D vectors + intrinsics.
Rather, i argue that privately minimizing to help you a good linearized abstraction hides away the brand new codegen complexities related to thoughts accesses by providing an untrue impression from enchanting vibrant indexing around the records. Alternatively i always build those individuals most direct in the MLIR and you can allow it to be codegen to understand more about tradeoffs. Additional HW will require some other tradeoffs on the sizes working in procedures 1., dos. and you may 3.
Conclusion generated on MLIR level can get ramifications within good much later phase in LLVM (immediately after register allotment). We do not thought to reveal questions regarding modeling out of sign in allowance and you will spilling to MLIR explicitly. Alternatively, for each address tend to expose a couple of “good” address procedures and letter-D vector products, regarding the costs you to definitely PatterRewriters during the MLIR peak could well be able to address. For example can cost you at the MLIR top is abstract and you can used having ranking, maybe not to have specific show modeling. Later like costs would be learned.
Implication for the Lowering so you can Accelerators ¶
To target accelerators that support higher dimensional vectors natively, we can start from either 1-D or n-D vectors in MLIR and use vector.cast to flatten the most minor dimensions to 1-D vector
It is the role of an Accelerator-specific vector dialect (see codegen flow in the figure above) to lower the vector.cast . Accelerator -> LLVM lowering would then consist of a bunch of Accelerator -> Accelerator rewrites to perform the casts composed with Accelerator -> LLVM conversions + intrinsics that operate on 1-D vector
Some of those rewrites may need extra handling, especially if a reduction is involved. For example, vector.cast %0: vector
However vector.cast %0: vector