Mechanism to split large contracts

chriseth · February 16, 2021, 10:17am

As suggested by ethernaut in https://twitter.com/the_ethernaut/status/1361489841519796225 the compiler should allow contracts to be split up more easily to avoid having to implement a proxy pattern manually. In my opinion, this is a good idea in principle, but requires a lot of details to be worked out.

Calls between two “fragments” of a contract will be expensive and create an external call, which is also problematic with regards to call protection. Maybe it would be better to explicitly group parts of a contract into fragments / modules that then have to use external calls (maybe delegatecalls) between each other.

We also have to be careful that this still works for how people currently upgrade their contracts.

maxsam4 · February 16, 2021, 11:12am

I’ll start by stating my opinions on the topic. I believe that the 24kb restriction should be removed from EVM rather than it being a useless rushed restriction that thousands of end users have to work around. I was pushing to get rid of it for a while but then lost motivation. If anyone else wants to lead the charge, I’ll be happy to help. There’s some useful and relevant content on

github.com/ethereum/EIPs

Removing Contract Size Limit

opened 03:39PM - 18 Dec 18 UTC

closed 12:41AM - 05 Dec 21 UTC

maxsam4

stale

**Abstract** A contract size limit of 24kb was introduced by EIP #170 . To solv…e the following problem problem: "when a contract is called, even though the call takes a constant amount of gas, the call can trigger O(n) cost in terms of reading the code from disk, preprocessing the code for VM execution, and also adding O(n) data to the Merkle proof for the block's proof-of-validity". I think we can solve this problem while still allowing _infinite_ contract size. **Motivation** Complex dApps require complex smart contracts. I have seen a lot of dApp developers struggle with this and a lot of alternative solutions are being used like using delegate calls. Delegate calls reduce code readability while developing and while verifying source code on tools like etherscan. They also introduce another attack surface and added complexity. **Specification** 1) The ethereum account array in state trie saves another element `codeSize` apart from existing 4: [nonce,balance,storageRoot,codeHash] 2) Whenever a new contract is deployed, its size is stored in `codeSize` of the account object. 3) If a contract is destructed, the `codeSize` should also be reset. 4) opcodes like `CALL`, `DELEGATECALL`, `CALLCODE` etc should charge additional `X`(3?) gas per extra word if the contract code size is greater than 24kb. **Rationale** The only reason why contract size was limited was to prevent people from exploiting the fixed costs. We can overcome this by making the costs variable. The `codeSize` element will help in calculating call cost before reading the whole contract and eventually throwing OOG. Merkle proofs will also be generated at fixed cost as we won't have to load the huge contracts from disk first. The `codeSize` should be enough. **Backwards Compatibility** We don't necessarily need to refactor the existing account arrays as all the existing accounts have less than 24kb of code so no extra cost has to be charged from calls being sent to them. We can assume `X = 0` if it's not available. This will mean that there are no backward compatibility issues. This is just an early discussion Issue. I will create a properly formatted draft with specifications after getting some more feedback from the community. References https://github.com/ethereum/EIPs/issues/170 https://github.com/ethereum/EIPs/issues/659 https://github.com/ethereum/wiki/wiki/Patricia-Tree#tries-in-ethereum

youtube dot com/watch?v=5W33u2OS92Q

Coming back to the language design, one way I think we can achieve this is by introducing contract level scopes or sub contracts and using delegatecalls to call into each other. The main contract should act as a static router that dispatches call to different sub contracts. All storages defined in sub contracts must be aggregated and assigned slots in the main router contract. A sample contract can then look like (psuedo code):

contract Main {
    uint256 A;
    function B() {
        Foo.J(); // Delegate call
        Bar.T(); // Delegate call
        C() // Internal call
        A = Foo.storage.I; // Direct read, write
    }
    function C() {}

    contract Foo {
        uint256 I;
        function J() {
            Main.C() // Delegate call
            Bar.T() // Delegate call
            K() // Internal call
            I = Bar.storage.S; // Direct read, write
        }
        function K() {}
    }

    contract Bar {
        uint256 S
        function T() { }
    }
}

The syntax can be made less/more explicit but I think this conveys the basic idea. Basically, it puts the developer in charge of defining scopes rather than compiler making a guess. Keep in mind that the router will still eventually hit the 24 kb limit but that’ll take a long while.

ajsantander · February 16, 2021, 11:21am

@maxsam4 I’m exploring something very similar here: synthetix-v3-labs/proxy-architecture-poc-5 at master · Synthetixio/synthetix-v3-labs · GitHub

I want the benefits of the Diamond Standard without all of its complications. That’s why I’m referring to this as the Router Proxy.

@chriseth take a look at this part, where it refers to inter module communication: synthetix-v3-labs/proxy-architecture-poc-5 at master · Synthetixio/synthetix-v3-labs · GitHub

So yes, there are multiple areas of slight awkwardness if all this is not handled at a lower level:

Storage: Namespaces are nice, but it means that “natural” contract slots are not used at all
Module calls: As stated above, expensive and with varying degrees of weird syntax
Router: Having selectors ranged between a set of numbers for each contract would be really helpful to completely avoid the need of a binary search in the router

chriseth · February 16, 2021, 12:17pm

This looks rather logical and straightforward. I assume that all externally visible function of the sub-contract are visible in the main contract (and proxied through). I think for the sake of readability, there should be something like contract MainContract { import SubContract; }.

Let me brainstorm a bit more: What could also be interesting is to make sub-contracts and especially their construction more explicit:

contract MainContract {
  export immutable SubContract subContract = new SubContract(1, 2);
}

The export causes the public functions to be visible in MainContract. If you do not use immutable, then the target pointer can be updated.

One problem with dynamic proxies is that we do not know which functions it has, so it is difficult to do this with more than one sub-contract.

maxsam4 · February 16, 2021, 12:42pm

The export causes the public functions to be visible in MainContract.

This sounds a bit unintuitive. How about allowing imports of functions instead?

import SubContract::*;
import SubContract::{A, B};
import SubContract::A;

The imported functions can then be called directly like A() while unimported functions will need to be called explicitly like Foo.C().

If you do not use immutable , then the target pointer can be updated.

Just to be clear, if immutable is not used, the address should be hardcoded in the bytecode like it’s done for libraries so that no storage read/writes are required.

One problem with dynamic proxies is that we do not know which functions it has, so it is difficult to do this with more than one sub-contract.

Why do we care about that? The router should use the subcontract provided at compile time to create the routing table and then It will just throw an error in runtime if the user tries to use a sub contract with a different interface. Similar to how contract objects work right now.

FWIW I think this needs a lot more thought but I believe we are on the right track.

chriseth · February 16, 2021, 12:59pm

“to be visible in MainContract” - I actually meant “externally visible in MainContract”, i.e. for callers of MainContract’s instances.

And to clarify the immutable part:

My point was that if you use export immutable SubContract subContract, then you can actually get the address of the sub contract. You can call functions using subContract.f() and you can even have multiple instances of the same sub-contract - not sure if that is useful, though. The immutable keyword means that the address is stored in code and cannot be changed after deployment. Furthermore, this notation allows you to provide constructor arguments for the sub-contract, because its creation is explicit.

As an extension, this allows the address to be stored in storage if the immutable keyword is not present. This has the benefit that the address can be changed if you like, for example for a code upgrade. This also means that the actual type of the contract is unknown at compile-time, and it can even changed during the life-time of the contract, as long as the interface is still compatible.

This might also be relevant here: Syntax for creating copies of contracts · Issue #2296 · ethereum/solidity · GitHub

maxsam4 · February 16, 2021, 1:04pm

“to be visible in MainContract” - I actually meant “externally visible in MainContract”, i.e. for callers of MainContract’s instances.

IMO public/external functions should always be visible. I don’t see any use of export keyword here just like I don’t see any use of payable addresses or calldata keyword in external functions.

The immutable keyword means that the address is stored in code and cannot be changed after deployment.

ah right, I read that as mutable for some reason. Sorry for the confusion. It makes sense now lol.

ajsantander · February 16, 2021, 3:11pm

It seems to me that we’re discussing the potential syntax of a better router contract, i.e. a contract that can be used to associate a set of modules in the same context. I’d love this of course, anything that improves what we currently have to do to achieve the same effect is awesome!

I’m just wondering if we should broaden the scope of the conversation. Imagine that we had an ideal solution to whats been discussed above. Developers will still need to code around a certain pattern (i.e. inter modular communications, storage in namespaces, etc), and coding around the pattern would deviate from the natural code style of the language.

So, I ask the question? Shouldn’t the language output artifacts that abstract this routing all together? I.e. if it finds that a particular artifact will exceed the contract size limitation, then it will split it in two and take care of the routing under the hood. That way, users of the language can maintain whatever code style they choose, and still be able to overcome the size limitation, or other limitations that may be imposed by the EVM.

Of course, this is just food for thought, I have no idea how complex this would be for the compiler, or if it would be possible at all.

maxsam4 · February 16, 2021, 5:51pm

It’s very hard to abstract the routing completely. It’s easy In basic cases where the contract can be split in shards that do not interact with each other. However, when the functions start calling each other, it’s very hard for the compiler to optimize the sharding strategy. Also, the internal functions can’t be called directly across shards so there will always remain a difference in coding a single contract and a sharded contract.

The earlier proposal does force people to adapt to a new pattern but it doesn’t seem out of place to me. The syntax we were discussing looks quite natural (apart from exports ). In any case, here’s a simpler example that can achieve the same goal while keeping bloat at minimum.

contract Main {
    uint256 A;
    function B() {
        J(); // Delegate call
        T(); // Delegate call
        C() // Internal call
        A = I; // Direct read, write
    }
    function C() {}

    {
        uint256 I;
        function J() {
            C() // Delegate call
            T() // Delegate call
            K() // Internal call
            I = S; // Direct read, write
        }
        function K() {}
    }

    {
        uint256 S;
        function T() { }
    }
}

Basically, this rips out namespaces and says everything must roll up to the main contract. The scopes only define what functions end up in which shard. I think this achieves 90% of what we want while still being relatively simple to understand, implement and use. tbh I’d prefer the earlier featureful solution but I can’t deny that having a simpler approach has its own benefits. If anyone who is reading this has any other ideas, feel free to share .

karl · February 16, 2021, 6:14pm

It’s easy In basic cases where the contract can be split in shards that do not interact with each other.

This already seems quite useful in a lot of cases & as you say doesn’t sound too difficult to implement. I bet there are a lot of folks who have contracts which could effectively be auto-sharded but that still struggle with contract code size.

Also really appreciate this discussion! This would be an awesome feature.

cameel · February 16, 2021, 9:56pm

@maxsam4 One thing that strikes me in your initial example is that nested libraries match your concept much more than nested contracts:

Your nested contracts can call each other’s functions even though they don’t inherit from each other.
External calls work via DELEGATECALL.
They don’t have their own storage and only use storage held by another contract.
- Though unlike libraries, they can access storage variables directly. I actually don’t think it would be possible without an external call.
Calls qualified with the contract name (e.g. Bar.T()) are external. This is already the case now when Bar is a library. If it’s a contract, it’s an internal call.
Just like in case of libraries you’d probably want a call guard. You don’t want someone to call it directly by mistake and use the storage that belongs to it rather than to the router.

Another thing is that the way storage would work needs some adjustment. I think that the common pattern with upgradeability is that you create a proxy which holds all the storage variables and in newer versions of your contract you only append new ones at the end. If we assume that with nested contracts the compiler would visits them in some deterministic order and add their storage variables to the router in that order then it would be too easy for users to break upgradeability by adding add a new variable in the middle of a block of existing variables. In fact, the only way not to break the contract on upgrade would be to only ever add new variables in the last nested contract (or in the router contract if we assume that variables placed after all nested contracts are placed last).

I see three ways to solve it (and there are probably more):

Require placing the variables directly in the router contract. This would be yet another restriction common with libraries.
Place variables from nested contracts in something that works like a mapping and mirrors the nested contract structure. One improvement over an actual mapping would be that the compiler, knowing all possible keys at all nesting levels, could make all accesses O(1).
Have user state the maximum number of slots each nested contract can use (only counting the static part that is normally placed at the beginning of storage). Then the compiler would pad it to that length or issue an error if it’s taking too much space. As long as the numbers are never changed once set, it would be safe to add new variables inside nested contracts.

maxsam4 · February 17, 2021, 6:14am

Place variables from nested contracts in something that works like a mapping and mirrors the nested contract structure. One improvement over an actual mapping would be that the compiler, knowing all possible keys at all nesting levels, could make all accesses O(1).

I like this idea.

One thing that strikes me in your initial example is that nested libraries match your concept much more than nested contracts

I agree with that but contracts and libraries are technically the same thing. We can call them whatever we want :). Heck we can introduce a third keyword like “SubContract”/“Scope” etc. I am not good at naming things though.

cameel · February 19, 2021, 9:34pm

I agree with that but contracts and libraries are technically the same thing. We can call them whatever we want :). Heck we can introduce a third keyword like “SubContract”/“Scope” etc. I am not good at naming things though.

I mean, it was not just a pedantic point about the naming What I’m getting at is that what you propose looks like a radical change on the surface but if we just made a relatively small change to libraries to allow to have an immutable state variable that represents a single contract they’re associated with, we would already have 90% of this feature. Libraries have almost the exact semantics you’re proposing. The only things missing would be the neat syntax for nesting them in the contract body and declaring some of the contract’s state variables inside them.

ajsantander · February 23, 2021, 2:26pm

@chriseth after a few days of ideas coming in, do you think we can start narrowing it down to something more concrete, even if its a small first step?

chriseth · February 25, 2021, 12:17pm

It would be nice if we could find a solution to make the fallback function obsolete and at the same time does not need too many changes, i.e. re-uses the library concept. So yeah, if you can come up with a complete proposal, then please go ahead! I also appreciate this coming from people who more often use proxy patterns than the compiler developers do

frangio · March 2, 2021, 6:27pm

I haven’t seen discussed how a split contract would be deployed. Would it be a single transaction or multiple transactions? In either case what should the creation code look like? Maybe the library linking mechanism should be reused?

maxsam4 · March 15, 2021, 11:47am

After collecting all the feedback here, I’ve created a draft proposal over at GitHub to get more eyes on it: A smart contract hub that routes calls to various child smart contracts to bypass the 24KB bytecode limit · Issue #11102 · ethereum/solidity · GitHub

mudgen · July 28, 2021, 2:19am

I don’t think it is necessary to implement splitting large contracts at the Solidity language level. The reason is because EIP-2535 Diamonds solves the problem in a sufficiently elegant way.

People don’t have to implement proxies manually, they can build on existing diamond implementations that have been audited.

I think what could help at the Solidity language level is special support for state variable layout for proxies, because the default state variable layout in Solidity doesn’t work for proxies with multiple facets/implementation contracts. Diamond Storage and AppStorage are solutions that are easy to use, but it would be good to have the state variable layout work more at the language level.

mudgen · September 24, 2021, 10:22pm

EIP-2535 is a mechanism and smart contract pattern that splits a large smart contract into multiple smaller ones, but they are accessible at a single Ethereum address.

Since there is already a standard that does this I think Solidity language features that makes it easier to implement EIP-2535 would be very helpful.

cosyDev · May 12, 2022, 8:47am

Hear hear! Still the best approach out there