I am looking for feedback on a potential future EIP regarding a standardized compact representation for calldata in EVM languages. The design I’ve been kicking around is as follows:
To illustrate, in ABIv3 a call to
foo(bool) would encode as
[ false, false, false, false, false, true, true, true, true, true, true, true ].
In the header byte (zeroth byte), the first three bits are the encoding version identifier,
000. The next five bits are the function identifier
00100, in this case, 4. If the function ID is 31 or greater, all five bits are set and the RLP encoding of the function ID is appended to the header byte.
7f which is the RLP encoding of the integer which represents the values in the bool array, where each element is represented by a bit. Notice that in this case the first five bools are
false which makes the integer
000001111111 or 127 i.e.
Boolean arrays are a special case. Everything else is much more normal. Values are straightforwardly encoded as RLP strings, except for arrays and tuples which are encoded as RLP lists.
An example call to
foo((string,bool,bool,int72),uint8) (given function ID 27) is
Note that this RLP-based design eliminates the need for zero-padding, 0xff sign extension, dynamic element offsets, and the always-four-bytes hash for a selector.
It seems to me that this could obviate a significant amount of the bespoke calldata hacks being used by Layer 2s and could benefit composability and adoption (not to mention gas costs). ABIv2 standardization seems to have failed, but this could revive it. Standardization is also beneficial for enabling tools to figure out what the heck is going on.
I should mention that I have a proof of concept implementation in Java GitHub - esaulpaugh/abiv3: ABIv3 proof of concept for Ethereum