c++ - Do most compilers transform % 2 into bit comparison? Is it really faster? -
in programming, 1 needs check if number odd or even. that, use:
n % 2 == 0
however, understanding '%'
operator performs division , returns remainder; therefore, case above, faster check last bit instead. let's n = 5;
5 = 00000101
in order check if number odd or even, need check last bit. if it's 1
, number odd; otherwise, even. in programming, expressed this:
n & 1 == 0
in understanding faster % 2
no division performed. mere bit comparison needed.
i have 2 questions then:
1) second way faster first (in cases)?
2) if answer 1 yes, compilers (in languages) smart enough convert % 2
simple bit comparison? or have explicitly use second way if want best performance?
yes, bit-test much faster integer division, by factor of 10 20, or 100 128bit / 64bit = 64bit idiv on intel. esp. since x86 @ least has test
instruction sets condition flags based on result of bitwise and, don't have divide , then compare; bitwise and
is compare.
i decided check compiler output on godbolt, , got surprise:
it turns out using n % 2
signed integer value (e.g. return n % 2
function return signed int
) instead of testing non-zero (if (n % 2)
) produces slower code return n & 1
. because (-1 % 2) == -1
, while (-1 & 1) == 1
, compiler can't use bitwise and. compilers still avoid integer division, though, , use clever shift / , / add / sub sequence instead, because that's still cheaper integer division. (gcc , clang use different sequences.)
so if want return truth value based on n % 2
, best bet unsigned type. lets compiler optimize single , instruction. (on godbolt, can flip other architectures, arm , powerpc, , see unsigned even
(%
) function , int even_bit
(bitwise &
) function have same asm code.)
using bool
(which must 0 or 1, not non-zero value) option, compiler have work return (bool) (n % 4)
(or test other n%2
). bitwise-and version of 0, 1, 2, or 3, compiler has turn non-zero value 1. (x86 has efficient setcc
instruction sets register 0 or 1, depending on flags, it's still 2 instructions instead of 1. clang/gcc use this, see aligned4_bool
in godbolt asm output.)
with optimization level higher -o0
, gcc , clang optimize if (n%2)
expect. other huge surprise icc 13 doesn't. don't understand wtf icc thinks it's doing branches.
Comments
Post a Comment