Adding HashMap-style storage layout to Solidity

Have there been discussions in the past on adding enumerable HashMaps to Solidity?

Current mapping types are very gas-efficient but have several other drawbacks:

  1. Impossible to enumerate keys/values
  2. Unintuitive for devs coming from other languages
  3. Inaccessible to off-chain analysis tools

If there is interest in discussing this, I would be happy to help work on it (I have no experience with CPP, but I’ll manage).
I have a possible spec in mind (that still requires work), but I didn’t wanna share anything before I understand the history and see whether there is interest in implementing such a thing.

1 Like

This is an inherent limitation of how mappings work. Mappings are relatively cheap because they’re not implemented as a complicated data structure but instead treat the whole storage as a massive array where collisions are unlikely as long as the keys you calculate are random enough.

Impossible to enumerate keys/values

The simple way around it is to store the keys separately if you actually need them. Implementing an iterable mapping is pretty simple. There’s even an example of that in the docs: Iterable Mappings.

You can also find a robust implementation of EnumerableMap in OpenZeppelin.

Inaccessible to off-chain analysis tools

Well, they’re accessible if you have access to a node and can inspect storage. But if you mean that they cannot be easily returned from functions, the issue is that mappings can be massive. A very common use case is to store something for every address that interacts with a contract.

I have a possible spec in mind (that still requires work), but I didn’t wanna share anything before I understand the history and see whether there is interest in implementing such a thing.

I think this is something that should be done in external libraries, not in the compiler. I’d recommend taking a look at OpenZeppelin’s EnumerableMap I linked above to see if you can improve it.

It’s not something that we’d add to the language itself though unless it’s an elementary building block for other data structures as the mapping currently is. More complex structures should be built by library authors and the goal of the compiler is only to give them tools to make that possible. For example we’d rather work on finally adding generics to the language than implement specific data structures ourselves.

1 Like

I should have mentioned that I am familiar with current solutions. I am also familiar with mapping’s storage layout and the technical details of why they are not enumerable/iterable.

Current solutions like the ones linked are:

  1. Storage-inefficient (e.g. 5 slots per key-value pair for the IterableMapping)
  2. Run-time expensive (require iterating over keys array)
  3. Clumsy to implement (using a library for something most devs expect out-of-the-box; evidently - OpenZeppelin themselves don’t use these in their ERC-20/ERC-721 implementations)

Regarding data accessibility - due to the storage layout scheme used for mappings, even having direct access to the LevelDB doesn’t let you enumerate mappings. Actually - you can’t associate slots to variables even if you traverse the entire storage trie because the keys are ran through keccak to generate the slot.

What I have in mind is more akin to Java’s HashMap:

I believe we can use a scheme that lets us enumerate more cheaply and uses up to 3 slots per key/value pair (the more pairs, the closer the ratio trends to 2 slots per pair). And I’m sure we can even come up with something better than my trivial attempt :slight_smile:

@cameel I am trying to work this into a library + user-defined types.

I have one last issue - can I infer the slot number of a variable?

I have a gist of the library itself here (~150 LoC):

The storage layout I implemented uses inline assembly to achieve a true HashMap layout and I am currently accessing slots based on the number that is wrapped into a HashMap.

This will be very confusing for users, as it might inadvertently cause storage slot collisions.

Current usage would require authors to keep track of slot numbers, or otherwise they’ll get collisions, like so:

import "./HashMap.sol";

contract Example {
    using HashMapLib for HashMap;
    HashMap _balances = HashMap.wrap(0);
    string symbol = "TOKEN";
    HashMap _allowances = HashMap.wrap(1); // --> Will overwrite slot 1, which is occupied by `symbol`
}

Is there a better way of achieving this?

Thanks!

Yeah, you can use .slot and .offset in inline assembly to get the address of a storge variable in storage.

See Access to External Variables, Functions and Libraries

Awesome! Thank you! :pray:

@cameel I must be missing something, but can’t figure out how to get the slot from inside the library.

Inside the library I got this code:

    function getSlot (HashMap map) private pure returns (uint slot) {
        assembly {
            slot := map.slot
        }
    }

But I get this error when compiling:

TypeError: The suffix ".slot" is not supported by this variable or type.

Any idea what am I missing?
Thanks!

You’re passing it by value so it’s no longer a storage variable here.

You could use a storage reference instead (HashMap storage map), but unfortunately this is not allowed for value types. I think youre best bet for now is to wrap it in a struct.

Nice! It works!
Feels kinds hacky but I’ll take what I can get :grin:

So what I did is I just changed:

type HashMap is uint;

Into:

struct HashMap {
  bool init;
}

Thank you @cameel! :pray:
Are there any plans to support something nicer for such use-cases?
I would be happy to assist working on something like that.

Well, I think that eventually it will be possible to have references to value types. Hard to say how soon and in what form exactly though. There are several big changes ahead of us aimed at making the type system more uniform, which is really one of the most important things on our roadmap in the short term, but we’re still in the process of preparing the design so unfortunately I cannot point at a single, concrete thing that could be done right now. Overall part of our plan is to change how references/locations work and it might end up being a part of that.

We’re also planning to make it possible to have user-defined reference types, similar to the current user-defined value types. This would make the fact that it’s a struct more of an implementation detail and not that much different from using a value type.

We may also introduce structs as non-reference types, which would bring them even closer to the UDVT solution in terms of cost and usage. That’s just a loose idea right now though and it’s not certain we’ll do that.

The fact that you’re trying to use it, finding holes in it and posting your feedback is actually a pretty helpful in itself. Seeing how things get used in practice helps us prioritize and make design decisions.

Another things that helps us make progress on designing features is participating in design discussions. We’re going to have some open design calls in the near future (this will be announced) so feel free to join and comment on the bits you’re interested in :slight_smile:

2 Likes

Thank you, very interesting to hear about these plans.
And I really appreciate your help :slight_smile:

I would love to join & listen to the design discussions, so I’ll keep an eye out for that.

1 Like

If anyone else stumbles on this thread looking for a HashMap implementation, I’ve released this as a library that can be consumed via Hardhat / Foundry:

2 Likes