Add the ability to make dynamic arrays in memory

Problem
Solidity currently only supports static arrays in memory, when it is more than capable of handling dynamic memory arrays.

A proof of concept can be seen with a repo I created. This repo uses a library which treats static memory arrays like dynamic arrays.

Proof Of Concept:

I believe that this has numerous use cases for smart contract developers. I understand that overuse of this feature could lead to higher gas costs when calling functions, but I believe, if used in moderation, it would be a very useful tool. Thoughts?

Well, due to how this statement is phrased, there are several ways to interpret it:

  1. Solidity currently supports only “one thing” - static arrays in memory.
  2. Solidity currently supports static arrays in memory but not static arrays in storage.
  3. Solidity currently supports static arrays in memory but not dynamic arrays in memory.

1 is obviously wrong (despite of it being the most accurate interpretation of your statement).

2 is wrong - here is an example of a static array in storage:

contract Test {
    uint[3] arr = [45, 67, 89];
    ...
}

3 is wrong - here is an example of a dynamic array in memory:

function test(uint length) {
    uint[] memory arr = new uint[](length);
    ...
}

Sorry for the confusion in phrasing. I agree with point 1 & 2, however, I disagree with point 3. Although you can dynamically create the size of the array in this example, it is still a fixed-length array because you are not allowed to use push() or pop(). The library I suggested uses inline assembly to add or remove elements from an array by modifying the array in memory, updating the length in memory, updating any succeeding variables, and updating the free memory pointer. In addition, it also adds insert() and remove() as methods.

I haven’t looked into your library, I just commented on that statement at the beginning of your question.

An array can be dynamically allocated in memory using the keyword new, which creates it with an additional 256-bit entry (at the beginning of the allocated memory block), indicating the length of the array.

For a given dynamically allocated arr, you can change the value at that entry to new_len via:

assembly { mstore(arr, new_len) }

Note that as of solc v0.6.5, the length of dynamic arrays in memory is bounded by 2 ** 64 - 1.

This restriction was imposed due to an overflow bug (see here for more details).

That is actually what I use in the library, but it also takes into consideration updating the free memory pointer a long with verifying that we are not overwriting other memory variables. The issue is a lot of developers do not know Yul or how memory works in great detail. I believe that adding push() or pop() would be a safer way for developers to use dynamic arrays in memory.

There is already a library for that; see Utilities - OpenZeppelin Docs.

The general idea about programming languages is to keep them minimal, i.e., avoid adding anything which can be achieved with the existing building blocks (for example, in a multiplicative group consisting of 2 and 3, we don’t need to add 18, because we can build it with 2 * 3 * 3).

The aforementioned library implements the features which you are asking for (push and pop for memory objects), which means that they are not required as building blocks in the Solidity language.

Just read over the OpenZeppelin library and it looks like it is for structs that mimic the functionality of memory arrays.

I get what you are saying about keeping the language as minimal as possible. The main benefit of adding this functionality to the language would be for gas consumption reasons. Right now, calling the memLibrary invokes a delegatecall in order to execute within the context of the calling contract. This seems over priced to me, and would like to see a less expensive version for developers to use. I totally understand if that is not worth the change though.

Which you can get rid of, by replacing every occurrence of public with internal.

it would increase the size of the calling contract’s byte-code, and subsequently the gas cost of deploying it, but it would not yield those delegate calls during any post-deployment transaction.

Actually this will not work for this library. If you change it to an internal function the code will not run in the context of the function, which can lead to overwriting any succeeding variables. If you would like to learn more about the design choices of the library you can read this article I wrote. Skip to Writing a Library, unless you are also interested in learning more about libraries.

This description is unclear to me.
In either case, your tests pass after replacing public with internal, so if this could lead to an issue, then your test-suite is incomplee (i.e., it does not cover some important scenarios).

Thanks for pointing that out I need to go update the test cases!

To clarify here is an example. Let’s say you have the following function:

function testPush1() external pure returns(uint256[] memory, uint256[] memory) {
        uint256[] memory arr = new uint256[](4);
        arr[0] = 0;
        arr[1] = 1;
        arr[2] = 2;
        arr[3] = 3;

        mem.push(arr, 4);

        uint256[] memory arr2 = new uint256[](1);
        arr2[0] = 100;
        
        return (arr, arr2);
 }

With internal and public functions you receive the same output that looks correct:

arr: [0,1,2,3,4]
arr2: [100]

Here is where I made a mistake in the test cases. I should have written the test function in the following manner (placing mem.push() after the declaration of arr2):

function testPush2() external pure returns(uint256[] memory, uint256[] memory) {
        uint256[] memory arr = new uint256[](4);
        arr[0] = 0;
        arr[1] = 1;
        arr[2] = 2;
        arr[3] = 3;
        
        uint256[] memory arr2 = new uint256[](1);
        arr2[0] = 100;

        mem.push(arr, 4);
        
        return (arr, arr2);
 }

This time you will see that having internal vs public will return two different outputs:

**Internal**
arr: [0,1,2,3,4]
arr2: [1,100,64,256]

**Public**
arr: [0,1,2,3,4]
arr2: [100]

What I was attempting to convey is that using an internal function in this case will read the memory location of arr[4] as the memory location that arr2 should be storing the length of the array. That is why arr2 outputs 4 elements (we pushed 4 to the end of arr). You can also see that the first two elements are 1 and 100, which is the original declaration of arr2 in memory. This means the EVM is still looking at the old memory location of arr2 because we are jumping inside of the internal function to execute the code. Using delegatecall, we are executing in the context of the original function (testPush2()). This tells the EVM that we are updating the data associated with these variables in testPush2().

The reason why my original test cases still passed is because the EVM learned about the new variables after updating the first array so it did not have to worry about overwriting them.

I hope this example clears things up but let me know if you need any further explanation :slightly_smiling_face:

Well, the delegate-call aspect means that using your library is probably more expensive than using OpenZeppelin’s library.

And regardless of that, “memory push/pop” as a native part of the language is not necessarily going to be less expensive than either one of them (let alone both of them).

I’d assume that all of these factors have already been taken into consideration far back during the early days of Solidity, but let’s see what the Solc team members have to say about this…

As mentioned above the OpenZeppelin library is not for memory arrays, it is for structs. Also, making it a native part of the language will significantly lower gas cost since it will remove the need for delegatecall/jump. But I agree let’s just see what the Solc team members think.

Hi @marjon-call and @barakman, thanks for the discussion. We understand the point of having such a feature in the language. However, due to our current memory allocation mechanism, in a general case, this can’t be done efficiently without completely replacing it. That’s why we intentionally do not provide a default solution at the moment. The existing solutions come with various trade-offs, and for now, it’s acceptable for them to exist as libraries.

Our current mechanism allocate data in memory one after another, with no space in between, and never deallocates. If you allocate a dynamic array for N elements and it’s not the last allocated data, there’s simply no space to insert an N+1th element, and all the available methods for doing so are quite expensive, for example, you could:

  1. Allocate a completely new array and move the contents.
  2. Move everything behind the array - this is what @marjon-call seems to have implemented.

We don’t have this problem in storage because there the slot representing the array only stores its size. The content is stored at a pseudo-random location, and the chance of that overlapping with anything else is close to zero. That’s why we can append new elements without checking.

The @marjon-call implementation may be useful in some cases, avoiding unnecessary memory moves, as it checks if the array is the last allocated data, and if it is, it doesn’t move anything, which is quite efficient. However, it may be inefficient in other cases, such as when there is other data past the array. Also, it extends the array one element at a time, potentially moving the entire memory content multiple times. So, sadly, we cannot use it as a solution for a general-purpose push().

2 Likes

Thanks! I appreciate the feedback