Stream: git-wasmtime

Topic: wasmtime / PR #2058 Aarch64 codegen: represent bool `true...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2020 at 21:15):

cfallin opened PR #2058 from aarch64-fix-bool to main:

It seems that this is actually the correct behavior for bool types wider
than b1; some of the vector instruction optimizations depend on bool
lanes representing false and true as all-zeroes and all-ones
respectively. For b8..b64, this results in an extra negation after a
cset when a bool is produced by an icmp/fcmp, but the most common
case (b1) is unaffected, because an all-ones one-bit value is just
1.

An example of this assumption can be seen here:

https://github.com/bytecodealliance/wasmtime/blob/399ee0a54c3be0f89ae07a0af5104ba929a9eba4/cranelift/codegen/src/simple_preopt.rs#L956

Thanks to Joey Gouly of ARM for noting this issue while implementing
SIMD support, and digging into the source (finding the above example) to
determine the correct behavior.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2020 at 21:15):

cfallin requested bnjbvr and julian-seward1 for a review on PR #2058.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2020 at 21:15):

cfallin requested bnjbvr and julian-seward1 for a review on PR #2058.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2020 at 21:17):

cfallin edited PR #2058 from aarch64-fix-bool to main:

It seems that this is actually the correct behavior for bool types wider
than b1; some of the vector instruction optimizations depend on bool
lanes representing false and true as all-zeroes and all-ones
respectively. For b8..b64, this results in an extra negation after a
cset when a bool is produced by an icmp/fcmp, but the most common
case (b1) is unaffected, because an all-ones one-bit value is just
1.

An example of this assumption can be seen here:

https://github.com/bytecodealliance/wasmtime/blob/399ee0a54c3be0f89ae07a0af5104ba929a9eba4/cranelift/codegen/src/simple_preopt.rs#L956

Thanks to Joey Gouly of ARM for noting this issue while implementing
SIMD support, and digging into the source (finding the above example) to
determine the correct behavior.

This PR also makes an unrelated cleanup: it appears that we had two ways of saying
cset, namely Inst::CSet and Inst::CondSet. Probably an artifact of our
implementation sprint with multiple people adding instructions at once. Removed
Inst::CondSet and settled on Inst::CSet.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 08:30):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 08:30):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 08:30):

bnjbvr created PR Review Comment:

nit: for future proofness, maybe assert to_bits == 32 here?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 08:30):

bnjbvr created PR Review Comment:

This entire commit should go in the last branch below, if I understand correctly.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 08:30):

bnjbvr created PR Review Comment:

Is the AND actually necessary, that is, are B1 inputs only one bit in practice?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 08:30):

bnjbvr created PR Review Comment:

nit: maybe use packed assignment and destructuring here: let (imm_ty, alu_op) = if output_bits > 32 { (I64, ALUOp::And64) } else ...

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 19:31):

cfallin updated PR #2058 from aarch64-fix-bool to main:

It seems that this is actually the correct behavior for bool types wider
than b1; some of the vector instruction optimizations depend on bool
lanes representing false and true as all-zeroes and all-ones
respectively. For b8..b64, this results in an extra negation after a
cset when a bool is produced by an icmp/fcmp, but the most common
case (b1) is unaffected, because an all-ones one-bit value is just
1.

An example of this assumption can be seen here:

https://github.com/bytecodealliance/wasmtime/blob/399ee0a54c3be0f89ae07a0af5104ba929a9eba4/cranelift/codegen/src/simple_preopt.rs#L956

Thanks to Joey Gouly of ARM for noting this issue while implementing
SIMD support, and digging into the source (finding the above example) to
determine the correct behavior.

This PR also makes an unrelated cleanup: it appears that we had two ways of saying
cset, namely Inst::CSet and Inst::CondSet. Probably an artifact of our
implementation sprint with multiple people adding instructions at once. Removed
Inst::CondSet and settled on Inst::CSet.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 19:31):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 19:31):

cfallin created PR Review Comment:

Clarified, thanks -- the first case is also semantically a sign extension, but generated specially; noted both cases in this comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 19:31):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 19:31):

cfallin created PR Review Comment:

Done! (Could also be smaller than 32, so assert is <=.)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 19:41):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 19:41):

cfallin created PR Review Comment:

Right now it's the case that every b1 we generate is 0 or 1 up to the full width of the register, I think, but I'd prefer to make use of that in a followup, as it needs some careful thought and auditing of the other ops. I suppose we could say that bits 1..7 are always clear (so a b1 is stored as a 0 or 1 in the lowest byte), then this would allow us to use the register-extending subtract format to do a zero-extend and negation at once; I'll think more about this. Thanks for bringing this up!

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 19:41):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 19:41):

cfallin created PR Review Comment:

Done.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2020 at 20:16):

cfallin merged PR #2058.


Last updated: Jan 24 2025 at 00:11 UTC