Thoughts on override(A, B) syntax

frangio · January 15, 2022, 7:33pm

What was the rationale for override(A, B) as opposed to simply override?

This is one of the main pain points of programming in Solidity. I think it desperately needs to be improved.

Below various thoughts on the topic.

override(A, B) gives the false impression of a locally determined inheritance ordering.

See OpenZeppelin/openzeppelin-contracts#2936 as an example. The issue proposes that if a function is override(A, B), the statement super.foo() should be replaced with an explicit list of the parent contracts, i.e. A.foo(); B.foo(). Those things are not equivalent. First, the ordering may be wrong, e.g. if A is B. Second, it may be incomplete, because it ignores that there could be an override C “upstream” (in the direction of super) that should be called as well. I think the syntax directly causes both of these misunderstandings by giving a false impression that A and B are all that’s relevant at that point (and in that order).

It requires knowledge of implementation details of dependencies, since A, B can be grandparents.

Alternatively, the developer needs to run the compiler, see the error messages, and only then they are able to add the right overrides.

I see both of these things as problems. Programmers should be able to develop intuitions about the language instead of relying on the compiler telling them what to write. That intuition should be through local reasoning as much as possible (as opposed to global reasoning about implementation details of the whole program).

Arguably multiple inheritance itself requires global reasoning to understand the behavior of a piece of code (though is this unique to multiple inheritance? In what way?), but I don’t think this needs to be surfaced to the syntax to such a degree.

I think override should be enough to resolve these conflicts.

At the very least could override(A, B) be changed so that A and B can be the direct parents, as opposed the grandparents that originate the conflict?

frangio · January 26, 2022, 8:42pm

@cameel Could you shed some light?

cameel · January 27, 2022, 3:06pm

The problem this is trying to address is that if you’re overriding more than one function, your new function must take into account all of what the base functions do because you’re effectively disabling them unless your implementation calls them explicily. If you’re overriding A.foo() and B.foo(), for each of them you make a decision of whether to call it, ignore it, replicate its code or do something completely different. Even if you do nothing, it’s a choice.

The thing is that overrides like that are really brittle and can break with non-local changes. If another of your base classes, say C, adds foo(), your function suddenly is overriding three functions even though nothing in the function itself changed. With override(A, B) the compiler can tell that you never considered C and force you to revisit your override. Hopefully you then update it to take the extra implementation into account.

See OpenZeppelin/openzeppelin-contracts#2936 3 as an example.

I think the syntax directly causes both of these misunderstandings by giving a false impression that A and B are all that’s relevant at that point (and in that order).

I can see how it may confusing and it’s unfortunate that it can be understood that way but this really has nothing to do with super. override(A, B) says that functions A.foo() and B.foo() are being replaced by your new definition. True, super.foo() may end up making a call to C.foo() from some other contract C that you are not directly inheriting from but C.foo() is not being replaced by your definition of foo().

C.foo() is not really relevant to the check because it’s not your definition of foo() that has to worry about it. Instead thare must exist some other contract Y that inherits both from your contract (let’s call it X) and from C. Y.foo() then needs override (X, C) and that’s where the compiler will want you to consider C.foo().

It requires knowledge of implementation details of dependencies, since A, B can be grandparents.

At the very least could override(A, B) be changed so that A and B can be the direct parents, as opposed the grandparents that originate the conflict?

You have a point here. I agree that specifying direct parents should be allowed. We could actually allow either, making such a change backwards-compatible.

I don’t know the motivation behind this particular design choice but it might have been that we wanted the programmer to look at the definition of foo() being overridden rather than to just mechanically add the name of the contract to override(). The definition you should look at is not necessarily in the parent and can be buried deeper in the hierarchy.

Personally, I think that we can’t really force anyone to handle this the right way and the job of the compiler here is just to provide enough information to bring attention to the problem and make fixing it possible. The error message already points at the location of the function being overridden and that should be enough.

override(A, B) gives the false impression of a locally determined inheritance ordering.

This is definitely a problem and we need a solution for it. The idea we had was to enforce a particular order #8373 with a warning but ultimately we wasn’t sure it really solves the problem and #8396 implementing it was closed. The discussion on that is still open.

There’s one more point to consider here. I think that a lot of pain related to override(A, B) comes unnecessarily repeating contracts in the inheritance hierarchy. Let’s consider this structure that you get when you use the OZ contract wizard:

contract ERC20
contract ERC20Permit is ERC20
contract ERC20Votes  is ERC20Permit
contract MyToken     is ERC20, ERC20Permit, ERC20Votes

It might seem that including ERC20 here should not be a problem because ERC20Votes already inherits from it. Unfortunately ERC20Votes overrides some functions from ERC20 and by inheriting directly from ERC20 you bring back the old, non-overriden versions of these functions. Then you need to use override(ERC20 ERC20Votes) to declare that you are aware that you’re overriding both of them.

This pattern seems pretty common overall and eliminating it would remove a lot of cases where you need to list overridden contracts. In the compiler we could silence it and make a decision for you by treating the version from ERC20 as if it wasn’t there (#12554) but it would be making a decision for the programmer and I think it might be better instead to issue an INFO note hinting that this pattern is superfluous to nudge people towards dropping it.

frangio · January 27, 2022, 5:55pm

I disagree, I think it definitely has to do with super, because super is the only viable solution. The compiler computes a global linearization that satisfies all local ordering constraints. Locally in each function body, the developer can’t express a linearization-correct order other than by using super.

I think much of this problem stems from the notion of overrides as “replacement”. In most cases, an override is an extension of a parent function, not a replacement.

I see this as a problem because it hinders the use of abstractions. They are no longer abstractions if the programmar has to look at their implementation detail. The abstraction should have the responsibility of itself being correct.

The issue is that in the common case this situation is not at a problem at all. All the overrides are correctly chained using super calls, there are no “replacements” happening.

Perhaps the compiler should formalize the notion of a “replacement” as an override that doesn’t include the super call in its body. And perhaps the current logic should apply to proper replacements and not to proper extensions.

This is not the way I see it. We have different intuitions about what it means to declare a parent.

It seems you’re saying that writing C is ERC20, ERC20Votes means something like “I want my transfer function to be taken from ERC20, and I want my transfer function to be taken from ERC20Votes”, and this causes a conflict that needs to be resolved because those two functions are different.

The way I see it, it means “I want my transfer function to behave like ERC20 and like ERC20Votes”. But in fact, behaving like ERC20Votes implies that it behaves like ERC20! Because ERC20Votes.transfer extends ERC20.transfer through the use of super.

Sure, in reality, ERC20Votes.transfer extends some parent Y’s Y.transfer, because linearization could result in ERC20, Y, ERC20Votes. But in the vastly more common case, Y.transfer itself implies the behavior of ERC20.transfer through super as well.

This is again the difference in the notion of replacements versus extensions.

frangio · January 27, 2022, 8:04pm

In the previous post I outlined a distinction between overrides: some are extensions and some are replacements. Extensions can be safely stacked and should not require disambiguation or explicit annotations. While replacements are something the programmer should be aware and (perhaps) manually disambiguate.

Take these two examples, suppose every contract defines a function foo:

     D          
    / \         
1) B   C     2) B   C
    \ /          \ /
     A            A

In example 1, B.foo and C.foo are extensions and they all trace back to D.foo.

In example 2, B.foo and C.foo are not extensions, there is no common parent, thus C.foo would replace B.foo.

I believe this is the real scenario that is potentially hard to see by the developer of A and that the compiler should point out, and here I think override(B, C) would make more sense.

But in example 1 I don’t think this disambiguation should be necessary.

frangio · January 27, 2022, 8:10pm

I realize now that in my examples above the “extensions” are exactly the overrides. In example 2, B.foo and C.foo could not possibly be declared with the override keyword.

So I can summarize my suggestion as follows. Manual disambiguation and override(A, B) should only kick in when there are definitions of the same function that do not trace back (via overrides) to a common definition.

ekpyron · January 28, 2022, 3:29pm

I do see your point and if we end up agreeing to weaken the override(...) requirements, then

Manual disambiguation and override(A, B) should only kick in when there are definitions of the same function that do not trace back (via overrides) to a common definition.

is definitely the way to go.

However, the problem I’m seeing is that I can just as well have an instance of your example 1, in which B and C are not extensions of D.foo, but replacements that impose additional and incompatible requirements on foo. E.g. there is no guarantee that either B.foo or C.foo do super.foo calls or the like, resp. are built as “extending overrides” that play along well with C3 linearization.

Do you see what I mean? The guarantee that B and C make by overriding D.foo is that they implement foo in a way that’s compatible with the requirements for D.foo. But I cannot tell from the inheritance structure alone that B.foo and C.foo are “extensions” and will be well-behaved in a C3 linearization. You’re saying that “in most cases” an override is an extension, not a replacement. I’d disagree with that - both are perfectly valid and I don’t necessarily see either one as more likely to appear.

I do agree in general that we have a problem - namely, we do not have a convenient way to construct an extension mechanism as you could achieve by dropping the override requirements for cases like 1 (if the situation actually is an “extension” case). That is, I do see that the override requirements are “annoying” for this use case.

But, on the other hand, the override requirements are conceptually valid and do provide an actual safety mechanism that’s not just vain or “wrong”.

In general, the inheritance changes that introduced these override requirements were partly a conscious move away from presuming a proper understanding of C3 linearizations, since we are often confronted with confusion in that regard. But since we don’t have an alternative to support patterns that are enabled by C3 linearization and super, we (as you rightly say in the respective issue) cannot remove super and disallow this pattern altogether.

So from where I’m standing this boils down to two questions:

Can we come up with a viable alternative for use cases that rely on C3 linearization and super, but which is less fragile in imposing implicit global assumptions? Or
Can we find a middle ground that makes these patterns more convenient again without compromising the additional safety provided by the explicitness in the overrides in other cases?

cameel · January 28, 2022, 10:42pm

The distinction between extension and replacement is an interesting notion. I agree that it sounds like a reasonable pattern that could be made easier.

So here’s an idea: what if we assumed extension by default and made the compiler detect if you’re actually doing that? I think that our analysis is sophisticated enough to be able to do that. We have the control flow graph, whose purpose is basically to track reachable paths through functions. We could use it here.

What do you think about changing override like this:

If you use plain override, we assume that your intention is to preserve the super call chain. I.e. if you inherit foo from A, B and C, your function has to eventually call[^1] foo() on each of them. Either directly of via super.

Only if you do not preserve the call chain, you have to specify override(A, B, C) like you do now.

Now, if someone creates Y that inherits from your contract X and some other contract D and suddenly appears in the middle of your call chain, the compiler can verify if A, B and C still get called. If they do not[^2], e.g. because D.foo() is missing the super call, the compiler can report an error. It can be fixed by calling them in some way or by slapping override(X, D) on the function to basically say “yeah, I know, D is fine”.
This still does not solve the problem of override(A, B, C) giving a false impression that A, B and C are actually doing something. So what if we started using it to enforce more complex cases? I.e. you could do things like override(A, !B, C), which would mean that a class appearing there means that you want the chain to go through A and C but not B. It’s up to you how you achieve it but the compiler can at least check if you did.
Finally, we could make the order count by verifying that they’re called in the right order, though I think this might be too hard to enforce without false positives (corner cases are hard). But I have another idea related to that. override(A, !B, C) is enough to tell the compiler what your intention is and stating it in the body is just repeating yourself. What if we used that annotation to have the compiler automatically generate the default body for you with the call order you specified? It would be useful in cases where you need to override a function only because two base classes have it and you don’t really want to add anything to it.

For example
```
function foo1() public override(A, !B, C) = default;
function foo2() public override = default;
```
would be equivalent to
```
function foo1() public override(A, !B, C) {
    A.foo();
    C.foo();
}
function foo2() public override {
    super.foo();
}
```
EDIT: I just realized that the override(A, !B, C) case is only easy if your function does not return anything. The super case might still be viable though.

[^1]: By “eventually call X” I mean that there must exist at least one path that leads from the entry point of function foo() to a call to X.foo(). It’s not an overly strong check but I think it would be enough here. For example calling it multiple times would count too. Calling it only in one if branch or only calling it indirectly through another function or even through a modifier would also count. If you inherit from multiple classes there must be a call to foo() from each one but not necessarily all on the same path.

[^2]: Again, by “not call X” I mean that we can detect absolutely no path where an X.foo() call would appear.

ekpyron · January 29, 2022, 12:11am

At least I’m rather sceptical about relying on more complex analysis (like inspecting what the function may or may not actually call) . I’m equally sceptical about giving the override list any meaning with respect to any inheritance order (which is as frangio noted not fully determined locally anyways and any partial relation will probably be more confusing than helpful).

I think either the override mechanism should basically stay as it is and not be complicated further, while we find an entirely independent mechanism for the “extensions”/super case. I’d prefer that, but I’m lacking good suggestions myself.

Or we have to change the override mechanism, but rather by simplifying it, not by complicating it (both syntax and analysis wise).

I’m also not yet sure that we’re actually on the same page about the general issue yet, which we should probably try to make sure of before trying to come up with concrete solutions…

cameel · January 29, 2022, 1:23am

I’m also not yet sure that we’re actually on the same page about the general issue yet, which we should probably try to make sure of before trying to come up with concrete solutions…

To me it seems clear at least that the issue is that overriding is just too powerful for some use cases. A more limited feature would suffice and at the same time would require fewer safeguards and less awareness of base implementations.

Looking at the case from #12554 I see that there are three functions from ERC20 being overriden: _afterTokenTransfer(), _mint(), _burn(). Let’s take a look at what the overrides in ERC20Votes really do:

function _mint(address account, uint256 amount) internal virtual override {
    super._mint(account, amount);
    require(totalSupply() <= _maxSupply(), "ERC20Votes: total supply risks overflowing votes");

    _writeCheckpoint(_totalSupplyCheckpoints, _add, amount);
}

function _burn(address account, uint256 amount) internal virtual override {
    super._burn(account, amount);

    _writeCheckpoint(_totalSupplyCheckpoints, _subtract, amount);
}

function _afterTokenTransfer(
    address from,
    address to,
    uint256 amount
) internal virtual override {
    super._afterTokenTransfer(from, to, amount);

    _moveVotingPower(delegates(from), delegates(to), amount);
}

What strikes me here is that these functions look pretty much like modifiers. They call the inherited function with unmodified arguments and do not use the result (there is no result in fact). It’s not that they’re just extensions rather than replacements. They’re pretty well-behaved extensions that fit a particular pattern.

So I think that we could already try to address at least this use case - as it clearly exists. Here’s another idea: what if, instead of forcing people to replace the whole body on override, we introduced a mechanism to just wrap inherited functions in extra modifiers? This would not need as many safeguards because the placeholder has very nice properties compared to super - you have to use it (you can make it unreachable but you have to do it on purpose, you can’t simply ignore it) and you have to use it in the body of the modifier itself, not somewhere deeper in the call chain.

modifier afterMint(uint256 amount) {
    _;
    require(totalSupply() <= _maxSupply(), "ERC20Votes: total supply risks overflowing votes");

    _writeCheckpoint(_totalSupplyCheckpoints, _add, amount);
}

function _mint(address account, uint256 amount) internal afterMint(amount) = inherit;

ekpyron · January 29, 2022, 1:51am

Yes, I think this may be moving towards the right direction and is related to @frangio’s comment in Rethink (or maybe even remove?) ``super`` · Issue #7261 · ethereum/solidity · GitHub with “the more explicit notion of before-, after-, and around-methods in CLOS”, which struck me as a potentially very suitable direction to solve this entire complex of issues.

It may be that we can solve all issues by:

Keeping the override logic and restrictions exactly as they are while additionally removing super, which results in clean replacement inheritance that is fully locally determined and unambiguous.
Additionally allowing before-, after- and around- extensions in CLOS style or equivalently a modifier-style extension mechanism (the “around” case is pretty much the same as the modifier suggestion and we can iterate on the syntax for any of this). This mechanism will intrinsically rely on C3 linearization and the explicitness requirements and restrictions of the current override system will indeed not apply to it. It would also only ever be allowed to occur relative to a common shared base function.
Disallowing the override mechanism and this new extension mechanism from occurring in the same inheritance hierarchy on the same function. (Resp. a potential more complex relationship between the two mechanisms would need to be well-defined with a lot of care.)

Funnily, the very word “override” itself suggests to me that I’m replacing a function, not extending an existing function, so also in that sense keeping the mechanisms fully separate makes sense.

But yeah, I’m not entirely sure at this point and I’d be very curious about @frangio’s thoughts regarding our last comments :-).

frangio · January 31, 2022, 9:30pm

Yes I see what you mean. I think the notion of “well-behaved” is valuable, though hard to define properly.

Regarding @ekpyron’s last proposal, it would be great to have alternatives to super but I don’t think removing it entirely is a good idea. There are some patterns that are neither of before, after, or around methods (when the return value of the super call is used). Even if we were to remove it, I think it’d be best to do it in stages: first introduce the cleaner extension mechanism, and once it’s proven to be a good alternative deprecate and then remove super.

While the proposal is good, it would require breaking changes. I don’t know how far in the future 0.9 is planned for (would be great to know though!), but assuming we have 0.8 for a while longer it would be great if the override requirements could simply be relaxed a bit for existing code, without new syntax.

Personally, I still think my suggestion (“Manual disambiguation should only kick in when there are definitions of the same function that do not trace back to a common parent”) is a good idea, even if this allows ill-behaved overrides to potentially fly under the radar, because the current mechanism doesn’t do a good job of preventing that either! See my example from earlier:

// both B is A; and C is A:
function foo() override(A) {
  A.foo();
}

// D is B, C:
function foo() override(B, C) {
  B.foo();
  C.foo();
}

This compiles and looks well-behaved, but isn’t!

ekpyron · January 31, 2022, 11:20pm

Well, in your last example A.foo() will indeed be called twice, if I call D.foo(). And D.foo() will never call anything but the explicitly mentioned functions independently of the C3 linearization of a most-derived contract. But that’s exactly what it looks like to me and exactly what I’d expect to happen, so in that sense I would say this is very much “well-behaved” :-).

The problem is that you may very well want A.foo() to be called only once and that cannot be properly expressed using explicit calls to base functions, but only with super - or an alternative to super… (let alone also calling functions that only come into play in the C3 linearization of yet another contract deriving D, etc.).

But independently of that: we definitely would not immediately remove super, but only deprecate it first (removing it is most definitely a breaking change anyways). Whether introducing a new cleaner extension mechanism would be breaking or not, would depend on the syntax we’d choose - it may actually be possible to do it as a non-breaking change, but that may limit our syntactic options and we may not be able to find nice syntax that is non-breaking (and I don’t think we have a fixed schedule for 0.9 yet… I would not expect it in the near future, but I would still expect it within this year at least).

There are some patterns that are neither of before, after, or around methods (when the return value of the super call is used).

This is something we would of course need to take into account. Theoretically, there are definitely such cases, so the important question is how likely this is to occur in practice.

We can still also entertain the idea of relaxing the override requirements in case there is a common base function - but it won’t be an easy decision to just downright drop them entirely. There may be a way to specifically mark such cases either at the base function or during inheriting in a less invasive and non-breaking way - if so, we could consider that, but out of my head I don’t see a really nice option…

frangio · February 1, 2022, 12:25am

Yes, I actually meant something different by “breaking changes”, but it was poor choice of words.

I see this from the perspective of the developer of a library that projects depend on. Will users have to update their dependency to benefit from the improvement to overrides? And will this dependency update be a breaking change?

If the improvement is simply that the override rules are relaxed, the answer is “no” (an update is not required). If the improvement is new syntax, the answer is “yes” (we would have to release an update using the new syntax), and “kinda breaking” because requiring a new compiler version is technically a breaking change. We prefer to avoid these breaking changes, but if we add the new syntax in a minor update, we will do it at least after some reasonable time since the release has passed, once we can assume it’s in wide use.

This comes up often so I want to explain why we’re so reluctant to breaking changes. We publish security fixes as patch updates to the latest minor. For example, even if a bug is introduced in 4.2.0, if the latest version is 4.4.0 we will release the fix in 4.4.1. If a user is using 4.2.0, the update to 4.4.1 should be easy, in particular it should not imply resolving breakage. This reduces the number of releases we have to do, which makes things a lot simpler for us to deal with, very important in the event of a security fix. I believe Solidity does the same thing.

In this concrete situation, if 4.4.0 makes use of new Solidity syntax, updating from 4.2.0 to 4.4.1 would break the project (the contracts wouldn’t compile) and to resolve it would require updating the compiler in use. While the breakage is likely going to stop there, because Solidity updates are backwards compatible (though it could introduced new warnings), it’s not ideal for this to happen during a security fix.

frangio · March 10, 2022, 3:47pm

Following up on this topic, were you able to discuss any of the suggestions with the rest of the team?

ekpyron · March 10, 2022, 8:11pm

Regarding the override logic itself:

In the meantime we had this annoying bug: Function override inconsistency bug · Issue #12615 · ethereum/solidity · GitHub
and I even wanted to relax the existing override logic at least somewhat while fixing it in Fix missing override errors when indirectly inheriting interfaces. by ekpyron · Pull Request #12616 · ethereum/solidity · GitHub - but then we stayed with the currently specified logic due to time constraints…

But we do have the plain to relax at least some cases with Overhaul Inheritance and Override Logic · Issue #12665 · ethereum/solidity · GitHub, even though that will require allowing additional bases in the override list to stay backwards compatible.

However, that’s not as much relaxation as you would like to have and I don’t think we will go any further than that.

However, we did also briefly talk about the option to introduce a before-, after- and around-style extension mechanism and that was generally received positively, especially if we’d manage to make it an extension to the modifier syntax or at least borrow from it.

But that’s about as far as we got so far - I guess the next step would be to specify such a mechanism in an issue to have a more concrete basis for discussing it further. If you’re up for that, that might help - I’m not sure how fast I’ll find the time to do it myself!

frangio · March 10, 2022, 9:10pm

The relaxation in Overhaul Inheritance and Override Logic · Issue #12665 is a significant improvement for sure. Although it’s a little disappointing that you won’t relax further than that, I think this direction makes sense.

I’ll try to put together an issue. Thanks!