Line number logging for EVM assembly?

rjx18 · December 18, 2021, 6:14pm

Hi Solidity team! I am a master’s computing student studying at Imperial College London. I’ve been writing solidity code for quite a number of projects now, and I have realised that current tools like Ganache and Truffle don’t provide very detailed analysis of smart contract gas usage. Therefore, I am building a Godbolt-like tool for the Solidity language for my university masters project, which would show how exactly the bytecode is generated for each function and display the gas usage as so. Just wanted to ask whether there is any support for line number debug information in the assembly code generated, like in gcc -g? I’m just playing around with the solidity command line options now and noticed the new --ewasm-ir option, what difference is that compared to just the --ewasm option? Who can I also contact for more information about this? I’d love to help make some contributions where I can! Thank you!

rjx18 · December 18, 2021, 6:53pm

I see that doing

./solc --ir --ir-optimized --asm --optimize --debug-info all test.sol

outputs parts of the line numbers. There are however a lot of code referring to #utility.yul. According to this documentation, “#utility.yul is an internally generated file of utility functions”. What are these utility functions specifically? Are they related to the user code in any way?

cameel · December 19, 2021, 12:27pm

Hello, welcome to the forum!

Just wanted to ask whether there is any support for line number debug information in the assembly code generated, like in gcc -g?

It’s there if you enable --debug-info location (which is the default):

    ...
    tag_7:
        /* "test.sol":81:89  S memory */
      tag_10
      tag_11
      jump      // in
    tag_10:
        /* "test.sol":108:109  s */
      dup2
        /* "test.sol":101:109  return s */
      swap1
    ...

It’s not in line/column format but instead gives you start and end offset into the original source. It’s similar (but not identical) to the Source mappings in the AST.

There are however a lot of code referring to #utility.yul. According to this documentation, “#utility.yul is an internally generated file of utility functions”. What are these utility functions specifically? Are they related to the user code in any way?

You can see this source by selecting the evm.bytecode.generatedSources output in Standard JSON or in --combined-json generated-sources,generated-sources-runtime on the CLI (but note that we’re planning to deprecate --combined-json eventually so it’s better to use Standard JSON).

These are extra bits of Yul code that are generated automatically by the compiler. They’re not part of user’s source but their addition is triggered by the use of certain language features, ABI coder for example.

I’m just playing around with the solidity command line options now and noticed the new --ewasm-ir option, what difference is that compared to just the --ewasm option?

--ewasm-ir is the input Yul code translated to a typed Yul dialect with i32 and i64 as types. This is an intermediate form that is straightforward to further translate into Ewasm code.

This output is supported only in the assembler mode, i.e. when using --strict-assembly/--yul/--assemble to convert Yul code into assembly. Until recently the output selection options on the CLI were being completely ignored in that mode. This changed and now you can use --ir-optimized, --asm, --bin, --ewasm and --ewasm-ir with the assembler. If you don’t specify any, you still get them all, but this is only for backwards-compatibility and will change in the next breaking release. --ewasm-ir output was added because that part of the output does not really fit under either IR or Ewasm and is its own thing.

Who can I also contact for more information about this?

This is the right place for this kind of discussion. Feel free to ask questions about anything you need to know to build your tool.

You can also always find the team on the #solidity-dev channel on Gitter if you prefer live discussion.

rjx18 · December 19, 2021, 3:06pm

Hello! Thank you for the reply, it really helped my understanding. I dug around the solidity compiler and the javascript version of it as well, and have a couple more questions. I managed to get a contract to compile on a demo browser application, and I see that there is both the assembly and the legacyAssembly being generated by the javascript solc compiler. What is the difference between them? The legacyAssembly seems to include the direct opcodes, while the assembly is shortened a bit because it can just do (for example):

mstore(0x40, 0x80)

instead of:

PUSH 80
PUSH 40
MSTORE

However, I’m guessing if I want to derive gas costs based on the opcodes, it will be easier for me to use the legacy version?

Also, is there support for intermediate representation in Yul for the javascript version of the solidity compiler? I understand this is added fairly recently, so it may not have made it to the javascript version. Alternatively, is there a way for me to obtain the legacyAssembly from the C++ solidity compiler? Thank you!

EDIT: I figured out how to get the Yul representation in the javascript version, by referring to the settings here: Using the compiler — Solidity 0.5.7 documentation

Just wondering now whats the difference between legacyAssembly and the new assembly? Thank you!