Removing warning about missing license header

Reposted from #13598.

I want to open a debate about removing the compiler warning about missing license information, and in fact removing source code licensing concerns from the compiler entirely.

I believe this warning should never have been added in the first place. The motivation given in #7738 was that a license header would allow tools to auto-publish source code, presumably by automatically uploading it somewhere at the moment the contract is deployed. Has any Solidity developer ever requested this? There is no mention of such requests in #7738. It reads like the plan is to do this without user consent. I believe the question of user consent is much larger than one of licensing requirements. I may have the intention of licensing my code as MIT, I might even have made up my mind about this when I created the Solidity file and stated the intent in a license header, and yet it would be consistent with that to want to develop the code in private. Specifically, I could want to keep the source code private even while deploying the bytecode to testnet or mainnet. One might argue that the bytecode will be public by then, so you might as well make the source code public, but this ignores (among other things) the existence of comments in the code, which are definitely not present in the bytecode.

My intuition is that the idea to automatically publish source code was motivated by the difficulty that many developers face when trying to verify their source code. But removing agency from the developer should not be the solution. The tools for automatic verification are already there, as proven by things like hardhat-etherscan. If there are any quirks in the compiler that make verification difficult, the effort should be placed on fixing those quirks so that tool developers can do their job and provide a range of solutions to users.

The project of auto-publication of source code should not be pursued unless the user is asked for consent. If there is a request for user consent, at that moment the licensing requirements can be confirmed with the user.

As it stands, the warning is simply adding one more annoyance and concern to the developer, who is now forced to understand, for example, if there are any legal implications of putting in a license header or what UNLICENSED really means. This comes at a cost, and as far as I can tell provides zero value, considering that the value it was intended to provide should probably never materialize for the reasons I explained above.

1 Like

I’m not sure if I participated in the discussion and decision back then myself, but trying to reconstruct it a bit in any case:

The request (apparently) originated from tooling (not sure which exactly), which wanted to change its behaviour from making automatic uploading of sources on deployments opt-out instead of opt-in (with obvious benefits, since in general having sources available for deployed contracts is a positive unless there is reason against it). The ultimate decision for any auto-upload like that would lie with tooling, and the precise implementation and whether it’s well-communicated as opt-in or opt-out would also still lie with the respective tooling implementation.

So in general, the warning is meant to nudge the developer towards adding licensing information, while the reasoning behind this is at least partly based on promoting the ability of source verification by supporting tooling in uploading sources. There is, in general, a strong bias here, in that we’d like to see most deployed bytecode associated with verifiable source code, but I’d stand by that bias. Furthermore, missing license information is not an error, but a warning, based on which I’d say “removing agency from the developer” is a harsh way of putting this. Also directly associating auto-publication of sources (especially against the users will) to the warning also seems a bit harsh. I.e. there’s two separate issues here: is the warning more useful or more annoying in itself and should tooling auto-upload as opt-out based on the licensing information or not and if so how should that be communicated to users. The latter is only in so far of concern to solc as it may be reason for the former, i.e. for the warning.

While it may be poor means for backing up such a decision, a twitter poll at the time (https://twitter.com/ethchris/status/1250418075263885313) showed two thirds support of this in general, with a strong tendency towards a warning instead of the even harsher option of making this an error.

All that being said, we may still want to reevaluate, whether the warning is generally considered more useful in that it results in more code having machine-readable license information embedded, or annoying, resp. harmful - broader community input on that would be highly appreciated.

Possible compromises between keeping the warning and removing it entirely that come to my mind would be to downgrade the warning to an info message and/or to only emit it as part of natspec analysis (i.e. when requesting documentation output - however, this may be tricky, since the metadata contains both natspec docs and license information, so this ultimately affects the metadata hash in the bytecode, resp. requesting bytecode can be thought of as requesting license information as well in this sense).

If the goal is to make it so tools can auto-publish, then it seems like there should just be a thing for #pragma auto-publish off or something, rather than hiding it behind a required license.

I also am not a fan of this, as I don’t think the unit licensing of code should be per-file, but rather per repository and the whole repository should have a license, not every single file in the repository.

Right, there are multiple different issues intersecting here as pointed out by @ekpyron:

  1. Source code verification
  2. Auto-uploading source code
  3. Explicit and machine-readable licensing

Source code verification is good and I think we all agree on this. The benefits are multiple. We want to see more of it and better (e.g., full reproducibility).

Auto-uploading source code is in my opinion not a good idea (as explained in OP). Tools are free to implement it if they want, but I don’t think this should be a concern of the compiler at all. Unless there is something that can’t be done outside of the compiler, it should be done by the tool. For example, if the issue is licensing information, the tool can have its own configuration file with a field for the license, and if this field is missing it can emit a warning before compilation or deployment. When I talked about removing agency I meant by auto-uploading without consent, I wasn’t talking about the warning.

In the metadata hash paradigm of source code verification (implemented by Sourcify, for example), there may be a concern that with the solution I described above the license information is “out of band” and will not be included in this metadata, thus not included in the verified code. I haven’t thought deeply about the best way to have an ecosystem of verified source code, so I don’t know if the metadata hash is the way to go about it. But if we assume that it is, then it is a reasonable point: one would want the license information to be included in the metadata. In this case though instead of putting the license in a comment I would favor a more generic approach: the compiler should accept arbitrary metadata to be included in the bytecode (in addition to the default metadata). This is arguably a better approach for licensing too because a one-line comment in the code is not enough if there is a legal requirement to distribute the entire license text with the code.

Explicit licensing is good and we want to encourage developers to add a license to their code. A reason for this is that open source licenses in particular are good (this is subjective but I believe the ecosystem broadly agrees on this), and a missing license means that the code is not open source, so having more licensing information should lead to more open source code. As much as I want us to encourage this, I think the compiler warning is not the best way to do it, because it is shown way too early in the cycle at a point when licensing may not have been decided and the warning will just contribute to annoying a potential new developer.

1 Like

I think this is the only point that we somewhat disagree. One can believe that software copyrights are bad, and adding licenses to their work means capitulating to a horrible and broken system. Such a person may prefer to signal their disagreement with the entire concept of copyright by not including any license at all, and currently solidity doesn’t allow for that (you can put UNLICENSED, but that is still requires acknowledging and supporting the entire system of software copyrights as UNLICENSED is a specific form of copyright).

That’s a perspective I hadn’t considered, but I think it’s a valid one.

Based on previous conversations, I’m guessing the compiler team will disagree with this statement, so I want to comment on it. The argument is that because a missing license only a produces a warning, it is technically allowed. This is true, but I don’t think it’s the right framing. Developers should be encouraged and enabled to write warning-free code. As long as there is no built-in way to turn off compiler warnings, it is not possible to be warning-free without putting in at least the UNLICENSED line. Warnings without a way to opt-out are essentially soft errors.

you can put UNLICENSED, but that is still requires acknowledging and supporting the entire system of software copyrights as UNLICENSED is a specific form of copyright

What do you think about using NOASSERTION to signal that?

I also am not a fan of this, as I don’t think the unit licensing of code should be per-file, but rather per repository and the whole repository should have a license, not every single file in the repository.

IMO it should really be per-contract since contracts are independent units that can be deployed separately and have separate metadata. Doing it per file is a simplification of that on the assumption that things in the same file will almost always share the same license.

The argument is that because a missing license only a produces a warning, it is technically allowed.

Not wanting to include a license is a valid choice, the problem is just distinguishing that case from the user not being aware of the licensing issue at all. If UNLICENSED does not cover this use case, maybe we could think about some other way to express that.

I think the compiler warning is not the best way to do it, because it is shown way too early in the cycle at a point when licensing may not have been decided and the warning will just contribute to annoying a potential new developer.

Or the tool could instead filter out the warning and use its own mechanism to display it whenever and in whatever way is appropriate. E.g. as a final check before deployment. In the JSON output the compiler includes everything because the intention is to let the tool filter that as appropriate.

I think that the ability to interpret and transform the machine-readable output provided in StandardJSON is underutilized by tools. It is some extra work but it also allows these things to be adapted to the tool and also lets tools have different opinions on stuff we’re opinionated about :slight_smile:

Can you make the case for why the compiler should be concerned with making the user aware of the licensing issue? In my opinion licensing is very clearly not an issue of the language, and the compiler should only be concerned with issues of the language.

If this is the expectation, I don’t think it should be a warning, because it would be weird for tools to filter out warnings automatically. It should be a new thing, and it should probably be intended as “opt-in” (choosing to show it) rather than “opt-out” (filtering it out).

For a little reduction ad absurdum: should the compiler also be throwing a warning if the header doesn’t contain a standardized Code of Conduct, Privacy Policy, Terms of Use, and Cookie Policy?

I would be much happier if there was a mechanism for including arbitrary metadata in the hash, and leaving it up to tools built on top of the compiler to to check that the “right” set of things are included in the metadata as part of the build process.