c++ - Do most compilers transform % 2 into bit comparison? Is it really faster? -
in programming, 1 needs check if number odd or even. that, use:
n % 2 == 0 however, understanding '%' operator performs division , returns remainder; therefore, case above, faster check last bit instead. let's n = 5;
5 = 00000101 in order check if number odd or even, need check last bit. if it's 1, number odd; otherwise, even. in programming, expressed this:
n & 1 == 0 in understanding faster % 2 no division performed. mere bit comparison needed.
i have 2 questions then:
1) second way faster first (in cases)?
2) if answer 1 yes, compilers (in languages) smart enough convert % 2 simple bit comparison? or have explicitly use second way if want best performance?
yes, bit-test much faster integer division, by factor of 10 20, or 100 128bit / 64bit = 64bit idiv on intel. esp. since x86 @ least has test instruction sets condition flags based on result of bitwise and, don't have divide , then compare; bitwise and is compare.
i decided check compiler output on godbolt, , got surprise:
it turns out using n % 2 signed integer value (e.g. return n % 2 function return signed int) instead of testing non-zero (if (n % 2)) produces slower code return n & 1. because (-1 % 2) == -1, while (-1 & 1) == 1, compiler can't use bitwise and. compilers still avoid integer division, though, , use clever shift / , / add / sub sequence instead, because that's still cheaper integer division. (gcc , clang use different sequences.)
so if want return truth value based on n % 2, best bet unsigned type. lets compiler optimize single , instruction. (on godbolt, can flip other architectures, arm , powerpc, , see unsigned even (%) function , int even_bit (bitwise &) function have same asm code.)
using bool (which must 0 or 1, not non-zero value) option, compiler have work return (bool) (n % 4) (or test other n%2). bitwise-and version of 0, 1, 2, or 3, compiler has turn non-zero value 1. (x86 has efficient setcc instruction sets register 0 or 1, depending on flags, it's still 2 instructions instead of 1. clang/gcc use this, see aligned4_bool in godbolt asm output.)
with optimization level higher -o0, gcc , clang optimize if (n%2) expect. other huge surprise icc 13 doesn't. don't understand wtf icc thinks it's doing branches.
Comments
Post a Comment