I am looking for feedback on a potential future EIP regarding a standardized compact representation for calldata in EVM languages. The design I’ve been kicking around is as follows:
To illustrate, in ABIv3 a call to foo(bool[12])
would encode as 0x047f
for [ false, false, false, false, false, true, true, true, true, true, true, true ]
.
In the header byte (zeroth byte), the first three bits are the encoding version identifier, 000
. The next five bits are the function identifier 00100
, in this case, 4. If the function ID is 31 or greater, all five bits are set and the RLP encoding of the function ID is appended to the header byte.
Next is 7f
which is the RLP encoding of the integer which represents the values in the bool array, where each element is represented by a bit. Notice that in this case the first five bools are false
which makes the integer 000001111111
or 127 i.e. 0x7f
.
Boolean arrays are a special case. Everything else is much more normal. Values are straightforwardly encoded as RLP strings, except for arrays and tuples which are encoded as RLP lists.
An example call to foo((string,bool,bool,int72)[2],uint8)
(given function ID 27) is 0x1bcac44180010ac44201800181ff
.
Note that this RLP-based design eliminates the need for zero-padding, 0xff sign extension, dynamic element offsets, and the always-four-bytes hash for a selector.
It seems to me that this could obviate a significant amount of the bespoke calldata hacks being used by Layer 2s and could benefit composability and adoption (not to mention gas costs). ABIv2 standardization seems to have failed, but this could revive it. Standardization is also beneficial for enabling tools to figure out what the heck is going on.
I should mention that I have a proof of concept implementation in Java GitHub - esaulpaugh/abiv3: ABIv3 proof of concept for Ethereum