Hi all — first post here. I’m Konsta, and I build DeFiMath, an open-source, gas-optimized Solidity library for on-chain financial math (Black-Scholes pricing, fixed-point primitives, roots, and so on). I’ve been deep in root functions lately and wanted to share one small change to how the initial estimate is found — and hear how others are handling it now that Fusaka has shipped.
Newton/Babylonian sqrt converges fast, but only if the starting guess is close. So the first job is to find the magnitude of x — essentially its top set bit — and build a seed from it.
Before: you locate the magnitude with a shift/compare cascade. This is the estimate section from Solady’s sqrt (MIT):
z := 181
// find the magnitude of x via shift/compare
let r := shl(7, lt(0xffffffffffffffffffffffffffffffffff, x))
r := or(r, shl(6, lt(0xffffffffffffffffff, shr(r, x))))
r := or(r, shl(5, lt(0xffffffffff, shr(r, x))))
r := or(r, shl(4, lt(0xffffff, shr(r, x))))
z := shl(shr(1, r), z)
z := shr(18, mul(z, add(shr(r, x), 65536)))
Four conditional steps just to pin down the top bit before the seed is computed.
After: with EIP-7939 live since Fusaka, clz gives the magnitude in a single 5-gas opcode, so that whole cascade collapses to one line:
z := 181
// magnitude of x in one opcode (index of the top set bit)
let r := sub(255, clz(x))
z := shl(shr(1, r), z)
z := shr(18, mul(z, add(shr(r, x), 65536)))
The downstream seed math is unchanged — only the magnitude lookup moves to the opcode. In my benchmarks that’s around 100 gas off the seeding step alone, plus smaller bytecode. Two things to keep in mind: align r’s rounding with whatever your interpolation window expects and test against the original, and remember clz(0) returns 256, so guard x == 0 as usual. clz is callable directly in inline assembly on solc 0.8.35+ when you target the Osaka/Fusaka EVM version.
Curious whether anyone has found cases where a software bit-length still beats the opcode, or where the seed precision matters less than I’m assuming.