Solidity Team AMA #2 on Wed, 10th of March 2021

My question may be a bit broad and could be summarized as: What’s the current state of YUL?

Like, to what extent is YUL already used when compiling solidity code? Since --optimize-yul is deprecated and the help says --optimize enables the YUL optimizer it seems that YUL is already integrated in the default compilation pipeline. I can imagine it is more nuanced and would be interested to understand the current state of the transition to YUL better.

Also, looking through solidity’s Github issues I noticed proposals to add inheritance, structs, tuples etc to YUL. These are proposals / feature requests and I understand that many potential features may still be undecided but I wonder how many higher level features the team is planning to add to YUL? Is the plan to keep YUL a lower level language used as an intermediate language or do you foresee YUL being used as a language to implement contracts directly?

At the current stage, Yul is used heavily for internal routines like the ABI coder, overflow-checked arithmetic and everything more complicated that has been added over the course of the last year. You can access these routines as an isolated file when requesting the evm.bytecode.generatedSources or evm.deployedBytecode.generatedSources fields via Standard-Json.

Apart from internal routines, we also re-wrote the entire Solidity code generator to go through Yul. Since the beginning of the year, we have 100% coverage on our semantic tests. You can request the generated yul code using solc --ir or solc --ir-optimized --optimize. This only exports the Yul code. To switch over the whole compilation pipeline, you can use solc --experimental-via-ir.

We are still working on carrying all metadata and debugging information across the new pipeline and we would also like to improve the optimizer so that the new code is cheaper in most cases and at least not much more expensive in rare cases. We are currently conducting gas tests to get a good picture.

If anyone has Solidity code that is already compiling in 0.8.x, we would be grateful for a pointer so that we can tune the optimizer for “real world” code.

There have been some discussion in connection to YulPlus but most of the proposals are on ice. One thing we would still like to improve is the memory management. If you make memory management more explicit, it is easier to move stack variables to memory or reuse unused memory.

Yul’s purpose is to be an intermediate or assembly language that is auditable It can be written by hand (see inline assembly), but should usually be generated by other programs.

2 Likes

In general, it is always recommended to use the latest release of the compiler, if at all possible. If, for whatever reason, you want to stick with older versions, be sure to check the List of Known Bugs — Solidity 0.8.24 documentation, since bug fixes will not be backported across breaking releases.

That being said, what you can expect gas-wise can vary for any contract at hand. There have been some significant improvements in the optimizer in the latest releases, but on the other hand changes like checked arithmetic and ABIEncoderV2 by default may also increase the gas cost of a given contract.
In case your contracts previously used safemath and ABIEncoderV2 already, you can expect gas savings. If not, you can still switch back to the old abi encoder using pragma abicoder v1; and use unchecked blocks for cases in which you had not used safemath before and may arrive at gas savings that way. In general, when using the same amount of safety features, gas usage is expected to improve with newer versions.

1 Like

Specifying the visibility of constructors, i.e. the use of public or internal for constructors, was deprecated in 0.7.0. The reason is that the mechanism overlaps with specifying that a contract itself is abstract or not. I.e. the only difference between a contract with internal and public constructor was that instantiating a contract with internal constructor directly (without inheriting from it) was impossible. But that’s also what it means for a contract to be abstract, so declaring a constructor internal has exactly the same effect as declaring the contract abstract.

Hence to avoid having multiple ways to express the same thing in the language, constructors are now public by default and in cases in which you would have made the constructor internal instead, the recommendation is to instead make the contract itself abstract.

Specifying the visibility of a constructor will likely be disallowed entirely in Solidity 0.9.

If you could design Solidity from scratch again, what would be different and which aspects would you keep the same?

Most design decisions were forced by the design of the EVM, so there was actually not too much wiggle room on the semantic side.
Some random details that I personally find annoying but are hard to change now:

  • event identifiers use a full 32 bytes
  • some aspects of the ABI
  • function identifiers could be a bit longer
  • we don’t need all the different bit-width types and small types do not provide a big benefit

Also some of the gas costs were not foreseeable at the point Solidity was designed. For example, we thought libraries might be a good way to split code, but delegatecall turned out just way too expensive for that purpose.

1 Like

There was a similar question in the previous AMA. See the answer for Does it make sense to cache arr.length?

Solidity currently doesn’t perform any caching of values read from storage in memory. However in some cases, values are cached in the stack. In particular, we can sometimes replace sload(key) and mload(key) with a variable in stack.

The default codegen and its optimizer cannot really cache values read from storage, but the default optimizer can still occasionally avoid loading the same value from storage multiple times. For example, in the following contract, the assembly would only contain a single sload.

contract C {
	uint x;
	function read_twice() public returns (uint a, uint b) {
		a = x;
		b = x;
	}
}

One can verify this by looking at the assembly generated by solc --asm --optimize contract.sol

However, the upcoming Yul codegen and Yul optimizer can cache significantly more cases. Examples of this can be found in our test suite. One can verify this by looking at the IR generated in the upcoming compilation pipeline: solc --ir-optimized --optimize contract.sol. Feel free to make suggestions on optimization opportunities that we miss in your contracts, especially when the rules are generic.

Also, in principle, all operations can often be ‘cached’ in memory and written to storage in the end. This, however, may complicate existing code. This is illustrated in the following contract where we sort a storage array:

contract  C {
	uint[3] arr;
	function sort_arr() public {
		uint[3] memory arr_copy = arr;

		// now perform sorting on arr_copy

		arr = arr_copy;
	}
}

The above example can save some storage-loads and storage-writes (i.e., sload and sstore respectively) when compared to performing the sorting directly on the storage array.

Regarding the last question about the current state of these efforts and challenges faced: most of our current efforts around optimization revolves around the Yul optimizer. This is because at some point in the near future, solidity will switch to the Yul compilation pipeline and the Yul optimizer is more modular and powerful than the bytecode based optimizer. Also doing this in the current codegen and bytecode based optimizer is difficult. This is because the generated bytecode is devoid of high level information. For example, function calls and the if statement would involve the jump or jumpi opcode. Additionally, the bytecode contains stack operations, such as dup5 or swap10. Both of these contribute to making the analysis harder, and therefore making the bytecode optimizer less powerful.

An example of the challenges we face is determining when a cached value is invalidated. In the following example for caching storage reads, the optimizer currently doesn’t infer that the function f only writes to storage slot 100 and therefore the value read from storage slot 0 is safe.

function f()
{
    sstore(100, 0)
}
let x := sload(0)
// assume that f cannot be inlined
f()
// can we replace the following by y := x?
let y := sload(0)

The above example is extremely simple and the Yul inliner would inline the function, allowing the replacement y := x. However, with more complicated functions that writes to a storage slot, this is harder to reason, especially when the slot that is written depends on the argument of the function, for example when writing a certain array index.

An even more interesting situation occurs when trying to cache memory loads in the stack, i.e., mload. Here is a Yul snippet on how Solidity copies function arguments from calldata to memory.

mstore(64, 128)
let _1 := 0
// Does not invalidate location 64
// Writes to memory location [128, 128 + calldatasize())
// Reasonable bound to calldatasize(): 2**32
calldatacopy(128, _1, calldatasize())
// Can this be replaced by z := 128?
let z := mload(64)

Even though a human can easily see that the memory location [64, 96) does not get modified after the first mstore, it is extremely hard for the optimizer to make this inferrence. Note that, for real world contracts, it is not easy even for humans to make such claims. We experimented with using an SMT solver to make such inferences here. Even though it is functional, it creates a new set of problems: trusting an external tool (here Z3 SMT solver) for generating assembly code, compilation and optimization performance, verifying the results of the solver, and producing deterministic results. Ideally we would want to have a simpler solver that is written to help perform such optimizations. An attempt towards this can be found here.

4 Likes

I’d also add that you’ll be missing out on some nice features that have been added since 0.6.8:

  • Free functions and file-level constants.
  • Calldata parameters in external internal functions (cheaper compared to memory arguments).
  • Assertion failures, out-of-bounds errors, overflow/underflow errors, etc. don’t eat up all the remaining gas in your transaction. You also get an error code indicating which of these happened. This does come at a cost of a slight increase in the bytecode size but will make your life a bit easier when debugging failures. You can even catch them and do your own error handling. This does not make much sense for assertion failures but might be a valid use case if you’re relying on built-in safe math for input validation instead of treating it as a sanity check.
  • SMTChecker got support for a ton of new functionality so if you’re using it, you definitely should upgrade.
  • You can change state mutability when overriding a virtual function.
  • Natspec for state variables.
  • Error messages and warnings now have proper error codes.
  • Lots of smaller improvements like fallback function being able to accept/return data via parameters, gwei as a unit or type(T).min/type(T).max helpers. Also tons of bugfixes which might not seem very important until you run into them yourself.
1 Like

I’m talking about f(a,b,c,), and probably only if it’s broken across many lines. A trailing comma would make it slightly easier to add another parameter at the end, or to swap/remove the last one. It’s not only useful for function calls, but also event definitions, struct instantiations, etc. Consider the following:

event SomeEvent(
    uint256 a,
    uint256 b
);

If I want to swap the order of a and b, I cannot just swap the lines. I also have to add a comma after b and remove the one after a. It’s of course only a tiny inconvenience, but still an inconvenience.

Other arguments I’ve heard are smaller diffs and a little bit more consistency.

I’m curious about your impression of the current state of 3rd party tooling around Solidity and more generally smart contract development, e.g., editor support, debuggers, linters, etc. Are you mostly happy or do you think it is lacking? Do you have any favorite tools, do you miss anything in particular? Is there a lot of exchange between the Solidity team and third party teams or is everyone mostly working on their own?

Ok, I see what you are getting at. Yeah, this is a controversial topic, also inside the team. My personal take is that this also makes it easier to remove elements without causing an error for overloaded functions or events.

One thing that might not be obvious to everyone is what the difference between public and internal constructors actually was and why visibility was used to express it. For functions the distinction is very clear. External functions are a part of the contract’s ABI and their parameters need to be encoded in a special way. They must be callable from the outside so they’re always included in the dispatch code that checks the selector and determines where to pass control. Internal functions on the other hand have less restrictions on their parameters (they can accept and return types that cannot be ABI-encoded) and they might be entirely skipped by the compiler if it determines that they’re unused.

In case of a constructor this is not the case because it does not even exist as a callable function after the contract is deployed. The only way you could “call” a constructor would be by using the new C() syntax but that does not have the same semantics as a function call. The constructor is only included in the bytecode sent in the data field of the transaction that creates the contract. We call this the creation bytecode. It runs once and its only purpose is to produce another piece of bytecode. This is the deployed (or runtime) bytecode and that’s what actually ends up on the chain. In the simplest case the deployed bytecode is embedded directly in the creation bytecode and just copied. But it can also be modified during construction - this happens for example if your code contains immutable variables whose values are embedded directly in the deployed bytecode.

You could stretch the concept and say that when creating a contract you’re calling its constructor externally and when your constructor calls another one from an inherited contract it’s an internal call. A contract with an external constructor would be one you effectively cannot inherit from while one with an internal one - one you cannot create. Only the latter case is useful in practice which is why at first we made it possible to make the constructor either internal or public. At some point we independently introduced abstract as another restriction you could put on contracts and we thought it was a clearer notion that also covered the same use case and more. This made specifying constructor visibility completely redundant.

1 Like

Allowing immutables of non-value types is being considered for the future. A general implementation would probably store the data in the data area of a contract and use codecopy to make it useable in memory. This would work for structs and even for dynamically-sized arrays.

But some of the details are not entirely clear, yet.
While it would be possible to copy an entire immutable of reference-type to memory using codecopy, you’d probably usually only want to access a single struct member or array element at a time without needing a full memory copy.

While statically-sized immutable structs or arrays could just be syntactic sugar for a number of value-type immutable (i.e. any member- or index-access to them would result in a placeholder “PUSH” in the bytecode to be filled by the corresponding value at construction time just as for value types), this won’t work for dynamically-sized reference types anymore.

We’d need to refer to those using some kind of “reference to data at some offset in the bytecode” and then successively use codecopy to read e.g. the size of a dynamic array or the code offset of nested arrays.

The gas costs and benefits of this would probably need to be evaluated on real-world examples first.

There has also been the idea to introduce a new data location called code (besides calldata, memory and storage) for this instead of calling them immutable as well (this might for example look more consistent when passing a reference to data stored in code to functions), but there is no consensus on it yet.

We did introduce calldata array slices for precisely this reason - but, unfortunately, the ABI encoding does not allow slicing of nested calldata arrays (since arrays of arrays in calldata use offsets relative to the “base pointer”, i.e. the start of the containing array, these offsets would become invalid in slices, unless we drag along the base pointer, which creates too much overhead).
Having .pop in calldata arrays is an interesting idea, though. I think, since calldata is inherently read-only, this has not been considered so far, but given that the length of a dynamic calldata array is stored on stack, introducing .pop would still be possible and basically yield an alternative way of “one-sided” slicing (and since the base-pointer would stay the same, this would also work for nested calldata arrays). So thank you, we will consider that!

Furthermore, there have been several ideas and suggestions for conveniently packing and unpacking data in general, which could be applied to packing to and unpacking from calldata, but we have not yet arrived at consensus about a final design.

So yes, we are aware that there is room for improvement in this area, but are still in the process of trying to find suitable solutions.

In case you have any concrete suggestions for this or if you have specific limitations or missing features in mind that we may have missed, it would definitely help, if you open github issues for them!

Thanks a lot @cameel.

A couple of points to make here.

Point 1.
Calldata parameters in external functions (cheaper compared to memory arguments).

even in 0.6.8, this already works as expected. we can specify calldata parameters as arguments instead of memory and they will be cheaper. So I don’t quite get your statement about this.

Point 2.

Assertion failures, out-of-bounds errors, overflow/underflow errors, etc. don’t eat up all the remaining gas in your transaction

If one uses safeMath , gas doesn’t get eaten up when overflow/underflow happens. I guess, what you meant is using assert doesn’t eat all the gas and almost behaves like require ? If that’s so, there’s not so much difference between assert and require anymore other than the fact that assert is just to use where code is not supposed to get there. do you agree in all that I said ?

Point 3
Free functions and file-level constants.
could you somehow put a link of the docs where explanation about this is explained ?

I’d appreciate your insights on this … Thanks a lot again.

Point 1.

Sorry, my bad. The support for calldata parameters was being extended over the course of several versions so I might be mixing up what was added exactly when. I said external functions but I just checked and indeed they already worked in 0.6.8. The change in 0.6.9 allowed you to use calldata everywhere so also in internal or public functions and for return values. After that, in 0.7.x there were several changes that fixed/added cases like calldata structs or slicing of calldata arrays. There were also multiple bugfixes. You’ll find the details in the changelog. So I guess it depends whether the limited support for calldata in 0.6.8 is enough for you or if you need one of the cases that were not there.

Point 2.

Right. SafeMath uses require so it does not eat up all gas and before 0.8.x there was no built-in safe math so it would not eat your gas either. So it’s mostly about assertions, out-of-bounds errors, bad enum conversions, etc. I listed underflows/overflows because they are classified as the same kind of error now but yeah, it’s not much of a change compared to SafeMath in that regard.

If that’s so, there’s not so much difference between assert and require anymore other than the fact that assert is just to use where code is not supposed to get there. do you agree in all that I said ?

Well, there are still differences. You can’t supply a custom message for a Panic while you can for revert’s Error. In the new custom error feature that we’re currently implementing you can even define your own errors and give them arbitrary parameters. But yeah, both reverts and assertions are now returning stuff even if it’s encoded a bit differently and you can catch panics just like errors.

The semantic distinction is still useful though. You want require/revert for validation while assert is for stuff that’s not meant to happen if your contract is not buggy.

Point 3.

Sure. There’s a bit about them in Structure of a Contract > Functions. But their semantics are very similar to internal library functions so there’s not much to write about. But if you think something particular is missing, please open an issue on github and we’ll fill the gap :slight_smile:

@cameel Thanks for the follow up. much Appreciated .

Point 1.
So it seems like that you’re saying I can specify calldata in the public functions. What if the public function gets called from the same contract ? does it mean that copying into memory is not necessary anymore at all and the both of the functions can just use the same data from the same calldata ?

Point 2:
I guess, the only difference between assert/require is a semantic distinction. One of the difference where assert was eating all gas is already removed in the new versions. Also, one can add custom messages to assert . So truly, I can’t see any other difference other than just semantic one which just is better for programmers to think and look at the code. If they see assert, then they immediatelly know that the program should never end up here.

That’s just my observation. do you think the same ? and it’s a good thing you made it so that assert doesn’t eat all the gas.

Exactly. You can pass calldata values down to your internal functions without forcing the extra decoding into memory.

Not really. assert() only accepts a condition. You cannot specify a message. In 0.8.x there’s an error code though. See Panic via assert and Error via require in the docs.

The semantic difference is actually pretty significant when it comes to formal verification:

Solidity implements a formal verification approach based on SMT solving. The SMTChecker module automatically tries to prove that the code satisfies the specification given by require/assert statements. That is, it considers require statements as assumptions and tries to prove that the conditions inside assert statements are always true. If an assertion failure is found, a counterexample is given to the user, showing how the assertion can be violated.

Lately assert() became a lot of closer to require() in how it works under the hood but syntactically they’re likely to diverge in the future. For one, we’ve long been considering a more formal support for invariants and they’d be another way to assert if enforced at runtime. At the same time we’re currently replacing revert() with a revert statement (with syntax similar to emit) to make it more apparent that custom errors are not function calls and we might also change require() syntax to match it (though to be honest we have no consensus about how it should look like and it might turn out that it actually stays as is in the end).

I’m curious about your impression of the current state of 3rd party tooling around Solidity and more generally smart contract development, e.g., editor support, debuggers, linters, etc. Are you mostly happy or do you think it is lacking? Do you have any favorite tools, do you miss anything in particular? Is there a lot of exchange between the Solidity team and third party teams or is everyone mostly working on their own?

I’m deeply grateful for the tooling teams putting up with our frequent breaking changes. The space is certainly still lacking a lot and we need better support from the Solidity compiler directly. We are in regular contact with all the tooling developers and we try to get feedback before making bigger changes that could break assumptions they might have about how the compiler behaves.

We are currently developing a language server that we hope will spark the development of new tools. It will be useful mostly for editors and auditing, instead of debugging because the protocol is focused around that area.

1 Like

I think that the big amount of existing third-party tooling is a great strength of the Ethereum ecosystem in general. For everything you need you’re likely to find several different tools, which is very healthy and gives you choice. Many of these tools are open source and, personally, I’m pretty happy seeing what is already available.

It’s actually hard to give an example of something that has not already been implemented in one way or another. Maybe a linter for Yul? Chris Parpart did take a stab at writing one but it ended up being a bigger task than we anticipated and it was abandoned. Another, very general idea would be some tool to make debugging contracts easier. There are some good debuggers out there but this is still a common pain point for users and there is probably some room for improvement, both for the existing tools and for the compiler in how it supports them.

That being said, the ecosystem is still relatively young and not all the tools are already mature. Solidity itself is evolving rapidly which we know can be a challenge to keep up with. To help with that we provide nigthly builds of the compiler that anyone can use to see how the newly added features will impact their project. There’s also an ongoing initiative to integrate some of the devtools’ test suites into our CI to get early feedback about breakage. Every team works separately but definitely not in isolation and we’re trying to improve the flow of information in both directions.

Overall, I’d encourage any tool developers to reach out and share their needs regarding interfacing with the compiler. We’re in regular contact with people working on the most popular frameworks and libraries but we care about your feedback even if the project is small and just starting out. There’s never too much of such feedback. Even though we get a lot of it already, not everyone cares about every feature and it can sometimes be hard to get enough information on how some specific change will impact other projects.

1 Like

Thanks @cameel.

There’s one question that I forgot to ask on 10th March and I will try to explain it here. Hope I can hear your opinions on this. It’s not related to solidity, but still, wanted to ask and appreciate your input on this very much.

My contract has 2 functions.

  • function1 and function2

Users first call function1, they pass a huge struct as an argument (one of the types in the struct is Action[]) array.

struct Action {
   address: to;
   bytes data:
   uint value;
}

Users can only call function2 on the same struct if they called function1 on it previously. function2 unfortunatelly uses much more gas than the function1 which gives us the following problem.

User can call function1 with the array of Action[] and each of the data, he can specify huge calldata. By calculating a little bit, what he can do is he can call function1 on that struct, and then he will never be able to call function2 due to the fact that function2 uses much more gas and if the Action[] is huge and if each of the bytes data in the action is huge, the gas cost of function2 grows unproportionaly, hence exceeds the block gas limit.

Other than ipfs solution which means that I don’t pass Action[] to the struct and only pass the hash of the ipfs, I’ve been thinking of limiting the length of the Action[] (100) let’s say. But this is not enough, as in attacker can also pass huge bytes data for each Action. So we need to limit the length of each bytes data.
It seems like we limit the Action[] length and we loop through it and check if each of the bytes data length is below some number.

Question 1: What do you think about the above solution ?

Question 2: I don’t know what length limit to take for each bytes data. If I take 1000, it means each action’s bytes data will only be able to call a function that has 1000/32 = 31 elements which is too low, any function can have such arguments because their argument would be an array.

Thanks a lot in advance.