Zeppelin Blog - Medium

RNDR Token Transfer Audit

Zeppelin — Thu, 29 Nov 2018 17:46:47 GMT

The OTOY team asked us to review and audit their RNDR Token contracts. We looked at the code and now publish our results.

The audited code is located in the Token-Audit and the Token-Airdrop repositories. The versions used for this report are commits e946177747e57312690775204834b8fca1bbb0d5 and d96202acf6fb5d0305368bac36aa960d455cbffe respectively.

Disclaimer: While the RNDR Token incorporates a migration strategy from a legacy token, we were not provided with the code of the token to be migrated, so this audit does not cover potential security issues associated with the intended migration. A new security audit is advised when the live migration is to be performed, for which it is suggested to use the coming version 2.0 of ZeppelinOS.

Here is our assessment and recommendations, in order of importance.

Update: The OTOY team made some fixes based on our recommendations on commits bad2e4d0c283a1fa75840806a7026869dab057ad and0d71bb89d20a192453614c80be815880b2ef3eac of the Token-Audit and the Token-Airdrop repositories respectively. The updates in the issues below refer to these new commits.

Critical Severity

None.

High Severity

The Escrow’s owner can arbitrarily increase the balance of any job without spending RNDR tokens

In theEscrow contract, the address of the associated token contract is tracked by the renderTokenAddress variable. The owner of the Escrow is allowed to change this variable at any time by calling changeRenderTokenAddress. If the owner called changeRenderTokenAddress passing the address of an account controlled by them as a parameter, they would therefore be allowed to call fundJob from that account and arbitrarily increase any job balance, at any point in time, without spending any tokens.

Consider analyzing the removal of the changeRenderTokenAddress function, or at least properly documenting the rationale behind its inclusion in the contract, so users are aware of such a dangerous scenario. While an attempt to do so was found in Escrow.sol, where a comment states that the function “[…] is included as a failsafe”, the issues that this mechanism would prevent from occurring are never explained.

Update: an event is now emitted when the changeRenderTokenAddress function is called, and documentation was added explaining the rationale for having this function.

Unsafe arithmetic operations in Airdrop contract

The Airdrop contract contains a series of arithmetic operations which are not being addressed with caution (in lines 30, 39, 56, 58, 64 and 65), leading to attempts to store numbers outside the range of the data types of their target variables. There are in particular two situations which could potentially cause integer overflows/underflows.

The first case is related to an assignment to a storage variable inside a loop in the addManyUsers function. This function iterates through a _recipientsarray of addresses and a paired _amounts array of uint256's. In each iteration, the function addUser is called, which adds the respective user amount to the storage variable totalBonus, in charge of accumulating the sum. For an extremely large list of users and amounts, the variable totalBonus may reach its maximum possible value and finally overflow (i.e.start again from 0). In this scenario, an inconsistency between the total bonus sum and each user’s bonus amount would be reached.

In the second case, an unsafe math operation in the payManyUsers function could lead to an integer underflow. After the contract is deployed, the storage variable nextUserToBePaid equals 0. When calling payManyUsers, if 0 was passed as the value of the batchSize variable (or the function was externally called without parameters), the unsigned variable idTo would be assigned the result of the operation 0 + 0 - 1, resulting in an underflow and a stored value of 2²⁵⁶ - 1. While in this case the immediately following if clause would prevent something unexpected to happen, this approach is error-prone and not advised.

Consider using OpenZeppelin’s SafeMath library to avoid underflows/overflows when doing mathematical operations.

Update: the SafeMath library is now being used throughout.

Contract owners can change business logic unnoticeably

In several contracts, the owner can arbitrarily change the business logic by setting new contract addresses, without properly warning users of those changes. Examples are:

RenderToken#setEscrowContractAddress: the address that will hold tokens in escrow and keep a ledger of funds available for jobs.
Escrow#changeRenderTokenAddress: the address of the token contract.
Escrow#changeDisbursalAddress: the address authorized to distribute tokens for completed jobs.

Consider emitting events to notify users about any modifications of such importance in the contracts’ business logic.

Update: events are now emitted to provide users with a mechanism to track these changes.

Inconsistent and experimental use of Solidity versions

Across the audited contracts, several versions of the Solidity compiler are used: ^0.4.18 in Escrow, ^0.4.14 in RenderToken, ^0.4.21 in the migration example contracts LegacyToken and MigratableERC20, and ^0.4.0in Airdrop, which also uses the pragma#experimental “v0.5.0”.

While the individual contracts can be compiled using different versions of the Solidity compiler, profuse versioning among the same codebase is confusing and error-prone. As indicated by its name, the experimental feature should not be used in production.

Consider using the latest (v0.4.25 at the time of writing) version of the Solidity compiler throughout the code.

Update: Solidity version v0.4.24 is now used throughout the project.

Medium Severity

Two different minting functions coexist in the RenderToken contract

As per the disclaimer in the header, we haven’t been able to assess the migration strategy for not having been provided with the code corresponding to the legacy contract to be migrated. Instead, the LegacyToken.sol and MigratableERC20.sol files were extracted from an example provided as a guide in an early version of ZeppelinOS. This guide explains how to migrate old token balances by burning old and minting new tokens, for which it introduced a _mint function in the new token, which extended StandardToken from openzeppelin-zos (now renamed to openzeppelin-eth).

Apart from the two copied files that function as placeholder, the RenderTokencontract implements the _mint function that is declared in MigratableERC20, from which it extends. However, RenderToken also extends from openzeppelin-zos's MintableToken, which already has a minting function named mint. Duplicating the minting functionality is confusing and potentially dangerous. Whichever form the migration ends up taking, consider using the already existing mint function for it.

Update: The RenderToken contract now extends from openzeppelin-eth'sStandardToken instead of MintableToken, and there is now a single minting function called _mintMigratedTokens.

Input arrays with mismatched length will make addManyUsers throw

The addManyUsers function in the AirDrop contract, in charge of registering the _recipients of the airdrop and their respective bonus _amounts, simultaneously iterates over both arrays based on the length of just one of them (_recipients). If the number of elements in _amounts is less than that in _recipients, the whole transaction will be reverted for attempting to access an out-of-bounds index.

Consider including a require clause with an explicit error message to check for matching array length.

Update: a require statement now checks for matching array length.

Omission of the transfer in disburseJob leads to inconsistent balance state

The RenderToken contract interacts with the Escrow contract by funding jobs, transferring tokens to increase the different jobs' balances, which are tracked by the Escrow. The disburseJob function later takes care of redistributing these funds among the recipients.

This function, however, does not actually transfer the tokens to the recipients, but simply sets an allowance, which the recipients can then use to transfer the tokens themselves. While there is merit in using a pattern where the beneficiaries are in charge of withdrawing their funds, the disburseJobfunction will leave the Escrow in an inconsistent state where the sum of the total jobBalances is different from its total token balance, since the former are depleted by the function but the latter isn't.

Even if the jobBalances mapping cannot be traversed to get the total balance without an external listing of jobs, consider implementing the Escrowbalance tracking in a way that doesn't lead to inconsistencies, or using the PullPayment/Escrow solution provided in the OpenZeppelin suite.

Update: the disburseJob function now performs the token transfers.

payUser fails silently if the bonus was already paid

In the payUser function of the AirDrop contract, an if clause is used to check whether a bonus has already been paid. In the case the condition amount > 0 is not satisfied, the payment will not be performed, giving the caller no notice that it didn't go through apart from the lack of an associated event.

Consider complementing the if with an else clause that handles the logic when the condition fails.

Update: an event is now emitted in case the if condition fails.

Unchecked ERC20 transfer operation

Inside the payUser function, a transfer is being made but there are no checks validating its successful completion. If the transfer somehow fails, an event logging the successful operation would be emitted despite the transferee not getting their tokens.

Consider using OpenZeppelin’s SafeERC20 library and its safeTransferfunction, or surrounding the transfer operation with a require statement.

Update: the SafeERC20 library is now used.

Missing checks for null addresses in RenderToken and Airdrop contracts

In RenderToken, the setEscrowContractAddress function allows the token’s owner to change the escrow’s address (i.e. the contract variable escrowContractAddress). However, the function does not implement a check to prevent the null address from being set.

Similarly in Airdrop, the contract’s constructor receives as a parameter an address that is assigned to the contract variable renderTokenAddress with no checks preventing the null address from being used.

Consider implementing no-null address validations before setting these variables to avoid potential problems downstream.

Update: null checks are now in place for address changes.

Storage modification on event emission

In the addUser function of the Airdrop contract, an item is pushed into the bonusAddresses array. The result of the operation, which is the length for that array after the addition, is used as a parameter in the AddedUser event emission. This, while valid Solidity, is confusing and error-prone.

Consider performing the storage modification and keeping the resulting value in a temporary variable before emitting the event for code clarity.

Update: the temporary value is now assigned to a variable before emitting the event.

Low Severity

Event parameters are not indexed

The AddedUser and PaidUser events defined in Airdrop as well as the JobBalanceUpdate event defined in Escrow are not indexing their parameters. This means that they will not be searchable in terms of those variables, making it impossible to track job balance histories, user additions or payments.

In case these are to be tracked, consider adding the indexed keyword to at least the userAddress variables in the AddedUser and PaidUser events, and the _jobId variable in the JobBalanceUpdate event.

Update: the AddUser and PaidUser events now index their parameters, but the JobBalanceUpdate one still doesn't. String indexing had an associated web3issue, which is purportedly solved in version 1.0. Alternatively, two workarounds for this issue are discussed here.

Deceptive inline comment in Escrow contract

Considering the issue “The Escrow’s owner can arbitrarily increase the balance of any job without spending RNDR tokens” above, the comment in Escrow.solthat states: “Jobs can only be created through the RNDR contract” is false and may be misleading for users reading the contract’s code. Consider rephrasing it clearly to state that job balances can be arbitrarily incremented by whatever account the owner of the contract sets as the renderTokenAddress.

Update: the comment was fixed to read: “Jobs can only be created by the address stored in the renderTokenAddress variable”.

Missing error messages in require statements

There are several require statements (such as Escrow:L70, Escrow:L89, Escrow:L109, RenderToken:L40, RenderToken:L45) that provide no error messages. Consider including specific and informative error messages in all require statements.

Update: all require statements now provide appropriate error messages.

Missing docstrings in contract and functions in Airdrop contract

TheAirDrop contract’s source code, which handles token distribution, has no inline documentation whatsoever. Consider documenting with docstrings everything that is part of the public API.

Update: the AirDrop contract is now thoroughly documented.

Untested functions in RenderToken

The RenderToken contract implements functions (e.g. holdInEscrow) that are not being tested in the test suite. Consider testing all functions implemented in contracts to ensure they behave as expected.

Update: some testing of holdInEscrow was done in the Escrow.js file, but an additional test for this function was added to the RenderToken.js file.

Broken testing instructions in README files

Instructions in Token-Audit/README.md file do not work if followed literally. An error “Cannot find ./config module” is thrown while running npm test. Instructions in Token-Airdrop/README.md file also do not work, with the error “Could not find artifacts for Airdrop from any sources” thrown while running truffle test. These errors might arise from differences in casing: the name of the contract in the Airdrop.sol file is AirDrop, but the artifact being required is Airdrop. This in turn might be due to the fact that in OS X, strings are case-insensitive. While this works on Mac’s filesystems, it can lead to setup errors when working across different operating systems.

Consider updating the instructions and including a working cross-platform configuration so developers and auditors can successfully run the test suite. Furthermore, given that the test suite for the Airdrop contract can only be run using thetruffle v5.0.0-beta release, consider including this version of truffle as a dev dependency in the package.json file of the project.

Update: the README files were updated with new testing instructions.

Erroneous documentation in initialize functions

Initialize functions (in Escrow and RenderToken) are incorrectly documented, since these functions are not contract constructors. Consider updating the inline documentation to fix these errors.

Update: documentation now refers to the functions as initializers instead of constructors.

Inconsistent use of imports in contracts

The use of imports in contracts is not consistent throughout the codebase. There are missing explicit imports (e.g. missing imports for Migratable and Ownable in Escrow.sol, and for MintableToken in RenderToken.sol). Consider explicitly importing all necessary contracts in each contract throughout the codebase to improve code consistency and legibility.

Update: all imports use now a consistent style.

Inconsistent coding style among different files

There is a significant coding-style difference between the contracts in the Token-Audit repository and those in Token-Airdrop repository. The contracts in the first one use docstrings, libraries like OpenZeppelin’s SafeMath, and 2-space indentation. The latter, on the other hand, has no comments in the source code nor takes security considerations into account by using already audited libraries, and uses 4-space indentation. Consider following best practices and applying the same style guidelines across all files.

Update: both repositories use now a consistent coding style.

Notes & Additional Information

The addresses of job funders are presumably meant to be tracked off-chain, but consider adding an event-based tracking layer as a failsafe (i.e., emitting an event identifying the contributor in RenderToken's holdInEscrow function).
Update: an event is now emitted.
In the Airdrop contract, consider prefixing all internal functions with an underscore to clearly denote their visibility.
Update: internal functions are now prefixed with an underscore.
Several public functions can be restricted to external. In particular: functions fundJob, changeDisbursalAddress, changeRenderTokenAddress, disburseJob, and jobBalance in the Escrow contract, and functions addManyUsers, payManyUsers, finalizeList, returnTokens and getUserCount in the AirDrop contract.
Update: these functions were restricted to external.
Consider including brackets in all control flow statements (e.g. in Airdrop.sol), to prevent issues with future versions of the language.
Update: brackets were added.
Consider declaring the canDisburse modifier before all function definitions.
Update: the modifier was moved before all functions.
Consider making all instances of uint explicitly uint256 (see Escrow.sol:L72 and Airdrop.sol:L47).
Update: all uint types are now explicitly uint256.
Variables listFinalized and nextUserToBePaid are explicitly initialized in AirDrop, but totalBonus is not. Consider initializing this last variable explicitly as well for code consistency.
Update: totalBonus is now initialized.
Consider explicitly marking all contract variables as private and defining getters/setters where appropriate, or at least explicit setting the visibility of all contract variables (e.g. jobBalances in theEscrow contract is not declared as public).
Update: jobBalances is now declared as private.
In order to load the Airdrop contract to execute the payments, it is necessary that someone with minting privileges or enough tokens adds balance from a RenderToken contract to the address of the recently deployed Airdrop contract. For clarity purposes, consider expanding step number five on the instructions in Token-Airdrop/README.md file to clearly state this precondition.
Update: the README file has updated wording on this point.

Conclusion

No critical and four high severity issues were found. Some changes were proposed to follow best practices and reduce the potential attack surface.

If you are interested in discussing smart contract security, join our slack channel, follow us on Medium, or apply to work with us! We are also available for smart contract security development and auditing work.

Note that as of the date of publishing, the above review reflects the current understanding of known security patterns as they relate to the OTOY RNDR Token contracts. The above should not be construed as investment advice. For general information about smart contract security, check out our thoughts here.

RNDR Token Transfer Audit was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Solidity Compiler Audit

Zeppelin — Thu, 01 Nov 2018 09:51:08 GMT

The Augur team and the Ethereum Foundation (through a joint grant) asked us to review and audit the Solidity compiler. We looked at the code and now publish our results.

The audited project can be found in the ethereum/solidity GitHub repository. The version used for this report is commit e67f0147998a9e3835ed3ce8bf6a0a0c634216c5 (tag v0.4.24).

The full report can be found here, and a list of the issues ordered by severity can be found next.

Critical Severity

High Severity

Medium severity

Low severity

Notes

Conclusions

Two critical severity and ten high severity issues were found and explained, along with recommendations on how to fix them. Some additional changes were proposed to follow best practices and reduce potential attack surface.

Update: All critical and high severity issues were fixed or addressed by the Solidity team.

If you’re interested in discussing smart contract security, follow us on Medium, join our slack channel, or apply to work with us!

Note that as of the date of publishing, the above review reflects the current understanding of known security patterns as they relate to the Solidity compiler. We have not reviewed the Augur project. The above should not be construed as investment advice or an offering of tokens. For general information about smart contract security, check out our thoughts here.

Solidity Compiler Audit was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Participate in Zeppelin’s Puzzle Game to Celebrate Devcon4!

Demi Brener — Tue, 30 Oct 2018 09:12:47 GMT

After the fun we had with last year’s Ethernaut hacking game, we’re releasing EthHunt, a new game to celebrate Zeppelin’s sponsorship of Devcon4!

EthHunt is a collaborative game in which people from all over the world will be competing in real time, looking for unique pieces to a single puzzle. Prizes totaling $1,200 USD will be divided equally among the finders of each piece.

How to play EthHunt

There will be 12 pieces to the puzzle — represented by ERC721 non-fungible tokens) and each piece is hidden behind a riddle that must be solved. Hints will be provided for each riddle. The hints vary: some merely require a keen eye, others ask you to explore and play with blockchain apps, while still others will test your knowledge of smart contracts and security.

Once all the pieces are found and the puzzle is completed, $100 USD (0.5 ETH) will be transferred to the finders of each piece.

But there’s a catch! The prize is locked in a smart contract and will execute automatically only once all players provide their pieces.

EthHunt provides an opportunity for you to practice your problem-solving skills, demonstrate your knowledge of the industry, and work with smart contracts and dapps — experiencing what our teams work on every day. And, because we’re looking to grow the Zeppelin team, anyone who successfully finds a piece will be granted a priority review for our current position openings.

Start playing EthHunt now!

You can follow the progress of the game on our Twitter account. For discussions, please join our Telegram community.

Good luck to you all. We can’t wait to meet the winners! See you at Devcon4!

Participate in Zeppelin’s Puzzle Game to Celebrate Devcon4! was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Getting started with OpenZeppelin-eth: a new stable and upgradeable EVM package

Martín Triay ⚡ — Thu, 25 Oct 2018 16:46:52 GMT

Getting started with OpenZeppelin-eth: the new stable and upgradeable EVM package

Announcing the EVM package implementation of the most popular smart contract library

These are great times for smart contract development. The pieces for Ethereum 2.0 are coming together, and new tools and practices are blooming. Last week, OpenZeppelin 2.0 came out with an improved, stable API and 100% test coverage plus a full independent audit. Maturity is around the corner.

This sets the right ground to release OpenZeppelin-eth, the EVM package implementation of OpenZeppelin.

What’s an EVM package?

An EVM package is basically an on-chain piece of code that you can reuse. Think of it as a code dependency that you can update to fix bugs without deploying anything. This means you don’t need to worry about maintaining or deploying your dependencies, since you can leverage other team’s work while having full control over your own upgrades at the same time.

So, whenever the EVM package maintainer (or a forked version) comes up with a bug fix, new feature or optimization, you will have the option of easily upgrading your code to the latest version without deploying a new contract, without manually migrating state, and preserving the same contract address.

As of today there are thousands of projects depending on OpenZeppelin in Github alone. Picture the scenario where a bug in the library is found and every single project needs to fix it on-chain, multiplying their efforts. Lots of time and money are spent in performing independent code reviews, new deployments, and state migrations — a chaos that many attackers could take advantage of.

Thanks to EVM packages, you can make use of OpenZeppelin-eth without worrying about any of the above. You just need to upgrade your linked package to the latest version (or preferred fork), and you’re done. Ready to focus on what really matters: buidling your own stuff.

Different proxy instances can point to different versions of the same EVM package

Try it now

You can easily deploy your own ERC20 token using OpenZeppelin-eth following these simple steps:

1. Set up your ZeppelinOS project if you don’t have one already.

$ zos init MyToken

Successfully written zos.json

2. Link the OpenZeppelin-eth EVM package to it

$ zos link openzeppelin-eth

Installing openzeppelin-eth via npm…

Successfully written zos.json

3. Deploy your own version of OpenZeppelin-eth into your local Ethereum blockchain. This step is only needed if you’re in a development environment, otherwise you can skip it: the package already contains the addresses of the mainnet and testnet deployments in its configuration files.

$ zos push --deploy-dependencies --network local

Compiling contracts

Deploying openzeppelin-eth contracts

(…)

Using custom deployment of openzeppelin-eth

Updated zos.dev.json

4. Deploy your own StandaloneToken instance linked to the package. You can check the full argument list of here.

// Make sure you replace $OWNER with your own address
$ zos create openzeppelin-eth/StandaloneERC20 --args MyToken,MTK,18,100000000000,$OWNER,[],[] --network local

Creating proxy to logic contract 0x82b1bb96e9e01a3ed187df80730f64e1db80a140 and initializing by calling initialize with:

- name (string): “MyToken”

- symbol (string): “MTK”

- decimals (uint8): “18”

- initialSupply (uint256): “100000000000”

- initialHolder (address): “$OWNER”

- minters (address[]): []

- pausers (address[]): []

Instance created at 0xf20a412847fcd2a210d4726b64f243cab39748f5

0xf20a412847fcd2a210d4726b64f243cab39748f5

Updated zos.dev.json

Thats it! You have deployed your own ERC20 token in the address 0xf20a412847fcd2a210d4726b64f243cab39748f5, just like that. Easy peasy :)

What’s the relationship between OpenZeppelin-solidity and OpenZeppelin-eth?

While the code is almost the same, the main difference is that OpenZeppelin-eth is upgradeable and ready to use. This means that you don’t need to download the code, customize and deploy it yourself since OpenZeppelin-eth, just like any other EVM package, is already deployed and installed on the Ethereum network. You just have to link it to your project, deploy it, and you’re good to go!

If you’re already using OpenZeppelin-solidity in your project, that’s ok too! Good old OpenZeppelin-solidity and OpenZeppelin-eth are maintained together by the same team, meaning they get the same updates at the same time.

Anyway, we do recommend using the EVM package implementation given the usability and upgradeability benefits. And remember: only you can decide if, when, and how to upgrade it.

Make your own EVM package

This is a trend we see coming in many projects and that has been already adopted by Zeppelin, Gnosis, Aragon, Livepeer, and Level K. We want to encourage teams and developers to start making their own EVM packages and building the next-generation tools for smart contract development:

Learn how to make your own EVM package now!
Got any ideas for one? Join our Telegram channel and share them — we’d love to know :)
Join the team 🚀

Getting started with OpenZeppelin-eth: a new stable and upgradeable EVM package was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Announcing OpenZeppelin 2.0

Francisco Giordano — Mon, 22 Oct 2018 13:14:46 GMT

A stable, audited, and fully tested package for smart contract development

Check out our step-by-step guide to OpenZeppelin-eth, the EVM package implementation of OpenZeppelin 2.0

When we first announced OpenZeppelin in 2016, smart contract security was in a crisis. Major hacks occurred every month, the same logic was being reimplemented again and again, and basic software engineering practices were not being used.

We set out to standardize what secure smart contract code should look like and to provide reference implementations of contracts commonly needed by the community. OpenZeppelin began simply, as a collection of code that was freely available online. However, a community soon formed around it, eager to experiment and build solid smart contract components.

Source: https://npm-stat.com/

In the two years that followed, we learned together and became a welcoming place for contributors and newcomers. We’re extremely happy with the success that OpenZeppelin has had, and are thankful to everyone that is a part of it.

Consolidating the Journey

Experimenting as the ecosystem evolved, the code went through many iterations. In retrospect, some ideas we tried were not the best, but the exploration process pushed important discussions forward. This set the foundations of a healthy community and codebase.

The next step in achieving OpenZeppelin’s mission is to turn the learnings and code from the past two years into a stable and reliable package. To do this, we’re happy to be releasing OpenZeppelin 2.0 with:

A Stable API. With the growing size and complexity of smart contract systems, such as the use of upgradeability mechanisms, developers need predictable interfaces.
100% Test Coverage. Every line of code in the package is tested automatically.
Full Independent Audit. Read the report by Level K.
Tons of community love. The project now has more than 150 code contributors, with many, many more helping with issues, support and reviews.

Now hold your breath, because this release was only possible because of the contributions of many, many people from everywhere in the world, and we want to thank all of them:

@3sGgpQ8H, @Aniket-Engg, @barakman, @BrendanChou, @cardmaniac992, @dougiebuckets, @dwardu, @facuspagnuolo, @fulldecent, @glesaint, @Glisch, @jacobherrington, @jbogacz, @jdetychey, @JeanoLee, @k06a, @lamengao, @ldub, @leonardoalt, @Miraj98, @mswezey23, @pw94, @shishir99111, @sohkai, @sweatyc, @tinchoabbate, @tinchou, @urvalla, @viquezclaudio, @vyomshm, @yaronvel, @ZumZoom.

Also we would like to thank all the people who are constantly helping others in our Slack channel, the ones who have given us feedback about the release, and the ones helping us triage and discuss our GitHub issues. If you are reading this wanting to jump in and make your first free software contributions, but you are unsure of where and how, talk to us! We can help you getting started, and we could use the extra hands.

With ❤ from the maintainers team of this release.
— @shrugs, @nventuro, @frangio and @elopio

To learn the details, check out the technical release notes.
If you want to help guide the development of this next stage of OpenZeppelin, jump onto our issue board and participate!
If you’re into smart contract development, follow us on Twitter or join the team!

Announcing OpenZeppelin 2.0 was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Global Coordination Machine

Manuel Araoz — Fri, 12 Oct 2018 12:59:29 GMT

Few applications need blockchains. Distributed consensus makes each computational step very expensive. Only apps for which users are willing to pay such a cost will make sense in the new decentralized paradigm. But what makes an app need a blockchain?

A Brief History of Computer Innovation

The first such app was bitcoin, the first free currency. Why are the ~20M bitcoin users willing to pay the price of running a currency on top of a costly, slow platform such as a blockchain? Let’s explore the answer by analogy.

In 1982, why did my dad spend $150 at an obscure computer store in Buenos Aires to buy a Timex Sinclair 1000? He wanted to experiment with programming. He was buying the ability to process information quickly and on his own terms.

In 1995, why did nerds spend thousands of dollars to connect to other computers around the globe? They wanted to read and post messages on online bulletin boards. They were buying access to information.

In 2007, why did so many people choose to spend $500 to carry around a small computer with a tiny screen, few apps, and very bad specs? They wanted a computer on-the-go. They were buying mobility.

Each step in the history of computation brought many inefficiencies, but also a new key feature. Early adopters were willing to pay the cost of switching to a new platform in order to get a taste of something different. Most times even they didn’t understand it fully.

So, what are early adopters buying when they use blockchain-based apps? They want to participate in a global, open, permissionless, stateful playground. They are buying access to global coordination. They get to participate in ground-up, large-scale human coordination experiments.

The time was ripe for such doors to open. We saw the world transition from paper-based systems (e.g., nations) to closed machine-based systems (e.g., corporations) to open digital-based systems (e.g., internet communities). A mere 10 years ago, only nation-states and huge multinational corporations such as Exxon, Google, and Apple could coordinate efforts globally. The cost of doing so was simply unbearable for anyone except them, given how slowly information flowed back then. Some such pre-blockchain experiments were the US dollar, capitalism, socialism, feudalism, the World Trade Center, and the International Space Station. Only huge centralized institutions could withstand the costs of such projects. That’s why they control the data, the money, and the terms on which we interact.

However, since Satoshi released a solution to the Byzantine Generals’ Problem to the world in 2008, the internet turned into a global coordination machine. This enabled us to create a global currency without relying on huge institutions as trust anchors. Not only did this enable bitcoin, but it drastically lowered the material barriers to innovation in the global coordination space by bringing it to the digital space. Kids around the world can now run experiments on global coordination from their mom’s basement at almost no cost. What’s best is that when they disagree on which direction the experiments should take, they can part ways and try their different ideas by forking the protocols. Some such experiments have been:

A globally shared virtual computer, Ethereum.
Thousands of privately issued application-specific currencies.
A cheap, fast, and scalable payment network.
Many globally available financial products such as decentralized exchanges.

Think about how hard it would have been for anyone to create a new stateful global coordination mechanism (such as the USD for global commerce) 10 years ago. It was virtually impossible unless they had a strong physical presence, collaboration agreements with multiple world regions, huge amounts of capital at their disposal, and lots of people to operate those components. Blockchains turned the internet into a digital jurisdiction where we can all run our global coordination experiments by using code and incentives, instead of paper and coercion.

Next Steps in Global Coordination

However, having a global playground for coordination experiments doesn’t come without its challenges. Nation-states and multinational companies have polished their methods over hundreds of years. We’re starting anew. Just like when online content creators had to rethink how to entertain and inform better than mainstream media. Just like when thinkers in the year 1500 reinvented how to express ideas after the Church lost its monopoly on the creation of books. Our challenge today is to achieve global coordination over the internet using a new perspective, not imitating what nation-states and multinationals did.

Some of the pressing challenges that the new playground blockchain technology has created are as follows:

It’s a new paradigm: few people know how to use the technology, and there are few good tools to work with it.
Code runs in a public realm, exposing apps to a global community of very incentivized attackers. This makes improving security practices a top priority.
As happened in the early days of the web or mobile platforms, there are no blockchain-specific UX patterns, jeopardizing user adoption.

As with every new technology, the first ones to use it are the creative individuals whose imagination pushes them to build a better world. For blockchain, this means hackers: those fluent in managing information at scale. Today, the basic components for solving these problems already exist but are not readily available to developers. What’s missing is a polished and integrated experience to bring experimentation to the next level. A sort of “Operating System” for this new playground. A platform that brings network effects to hackers who are building the revolution by sharing tools and code.

This is why at Zeppelin we recently decided to focus on ZeppelinOS. We want to tackle those problems head-on, and we believe we can do it. We’re creating tools for hackers to rethink how a new global society should work. And we believe in the transformative power of giving tools long kept only for a few to the masses to experiment with. We’ve all seen the amazing improvements that opening a space to permissionless innovation brought to biotech, computing, content creation, and the satellite industry. It’s time to take global human coordination projects into our own hands!

Help us build this future together!

Join the discussion on our Telegram Group and follow us on Twitter.
Even better, join the team!
Learn more about ZeppelinOS.

The Global Coordination Machine was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Deconstructing a Solidity Contract — Part VI: The Metadata Hash

Alejandro Santander — Fri, 28 Sep 2018 14:00:27 GMT

By Alejandro Santander in collaboration with Leo Arias

Image from pixbay.com

Note: This article is part of a series. If you haven’t read the previous article, please have a look at it first. We’re deconstructing the EVM bytecode of a simple Solidity contract.

Deconstructing a Solidity Contract — Part I: Introduction ✔
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
Deconstructing a Solidity Contract — Part III: The Function Selector ✔
Deconstructing a Solidity Contract — Part IV: Function Wrappers ✔
Deconstructing a Solidity Contract — Part V: Function Bodies ✔
Deconstructing a Solidity Contract — Part VI: The Metadata Hash ⬅

Update: This article initially contained mistakes and missed a few important points about the design of the metadata hash, which were pointed out by chriseth from the Solidity team. Thank you! Among the corrections made by Chris, is the fact that the structure is known as the “Metadata hash” instead of the “Swarm hash”, and that the Solidity compiler is agnostic in terms of the system used to store contract metadata.

In the last article, we noticed that the runtime bytecode generated by the Solidity compiler appends a strange structure after the function bodies block. You can see this in the deconstruction diagram or in the image below referred to as the “metadata hash”:

Figure 1: The metadata hash can be found in the last few opcodes of the runtime bytecode of a contract.

What exactly are these opcodes doing?

Notice the STOP opcode at instruction 421. You may think that if there’s a STOP opcode there, whatever comes after it is basically unreachable bytecode, right? Well, not exactly: the code could be JUMP-ing over the STOP opcode. However, there are no JUMPDEST instructions after it, so that rules out the possibility of any execution reaching this part of the bytecode via a JUMP.

In fact, if you analyze all the possible execution flows in this contract (which, believe it or not, is what we have done in this series!), you’ll see that indeed this code is totally unreachable.

So why would the Solidity compiler append non-executable code to its generated output? Actually, this isn’t the first time that this happens. We’ve seen it before in Part II of the series, where a constructor’s arguments were appended to the end of the creation bytecode. That code wasn’t supposed to be executed by the EVM either; it was just there as a sort of hack to store the initialization values of a contract for consumption in constructors.

Alright, we still haven’t answered the question of what this block of code is. To do that, let’s walk through the opcodes and try to make some sense out of them, shall we?

The first thing we see is a LOG1. If we look this opcode up in the Yellow Paper or in the Solidity documentation, we can see that LOG0 to LOG4 opcodes are used for logging events in the Ethereum blockchain. Which…makes no sense, since we won’t be executing any of this code...

After that, we can see a PUSH6 of 0x627a7a723058, a SHA3, a couple of INVALIDs, a SWAP10, a DELEGATECALL, etc. Wait, INVALIDs? What does that even mean?! Yup, total nonsense in terms of EVM interpretation. Clearly, looking at these bytes as EVM opcode representation is absolutely pointless. We need to look at this as raw byte data, which as you remember can be found in Remix’s Compile tab > Details panel > Runtime Bytecode section > object property. The LOG1 opcode is really an 0xa1 byte, so the whole block of code at the end of the contract looks like this:

// …a165627a7a723058202c27c1ef4be478b21f663f0d0ecdd1c73638730ffebbff1e3c7a234db7df6fd10029
// END OF CONTRACT

The answer to this riddle can be found in Solidity’s documentation, in the Encoding of the Metadata Hash in the Bytecode section. The documentation is brief, but it gives us exactly what we need. The compiler is hashing the contract’s metadata (which includes information about the contract such as its source code, how it was compiled, etc.) and injecting this hash into the contract’s own bytecode! This metadata can also be seen in Remix: Remix’s Compile tab > Details panel > Metadata section.

This hash can be used in Swarm as a lookup URL to find the contract’s metadata. Swarm is basically a decentralized storage system, similar to IPFS. The idea here is that some platform like Etherscan identifies this structure in the bytecode and provides the location of the bytecode’s metadata within a decentralized storage system. A user can query such metadata and use it as a means to prove that the bytecode being seen is in fact the product of a given Solidity source code, with a certain version and precise configuration of the Solidity compiler in a deterministic manner. This hash is a digital signature of sorts, that ties together a piece of compiled bytecode with its origins. If you wanted to verify that the bytecode is legit, you would have to hash the metadata yourself and verify that you get the same hash.

And that’s not all, the metadata hash can be used by wallet applications to fetch the contract’s metadata, extract it’s source, recompile it with the compiler settings used originally, verify that the produced bytecode matches the contract’s bytecode, then fetch the contract’s JSON ABI and look at the NATSPEC documentation of the function being called.

This end-to-end authentication path built into bytecode generated by the Solidity compiler can not only be used to provide a user with information of the action about to be performed, but also to validate the legitimacy of such action.

For example, if we look at the CryptoKitties contract in Etherscan, we can see that at the end of the page, Etherscan provides us with the contract’s metadata address in Swarm, which was extracted from the bytecode in the way we’ve just seen: bzzr://a6465fc1ce7ab1a92906ff7206b23d80a21bbd50b85b4bde6a91f8e6b2e3edde. You can look into Swarm’s documentation to better understand this URL scheme.

Let’s go back to our BasicToken’s bytecode and understand how Etherscan (or any other similar utility for that matter) actually finds the hash in the bytecode.

As the Solidity documentation states, an 0xa1 and an 0x65 will be injected to the bytecode. In EVM bytecode, these two hexadecimal values would translate to LOG1 and PUSH6. Now, if we decoded the letter “b” as UTF to hex, we would get 0x62, “z” would be 0x7a, and so on. You can see the whole thing decoded in the following diagram:

Figure 2: The metadata hash decoded as a Swarm URL.

So, any application trying to find the metadata hash in the bytecode would look for these bytes at the end of the contract, this particular pattern, and extract the URL from it.

Solidity uses a type of encoding called CBOR encoding, with which not only the hash is stored, but the specific decentralized storage system and version used is stored. In this case, it’s using Swarm’s version zero bzz:// URL scheme and that’s why the structure contains the chars “b”, “z”, “z”, “r”, “0”. Alternatively, it could use something like “i”, “p”, “f”, “s”, “r”, “0”, indicating that the structure encodes an IPFS URL scheme. This makes it agnostic in terms of which storage system is used. It could be changed in the future, or we could even get to choose which storage system we want the bytecode to reference upon compilation.

To retrieve the metadata file, we would have to connect to the same Swarm network to which the metadata file was uploaded to, using something like swarm-gateways.net or setting up a local Swarm node. Right now, this is something that is quite difficult to do, because Swarm is still under heavy development and has not yet stabilized its hashing scheme, which can be seen is something being addressed by Solidity in issue #4092.

The actual hash itself is a specific hashing algorithm executed on the metadata file of a contract, which the Solidity compiler calculates after it has run all its other tasks — mainly, compiling =D. When we were trying to interpret these bytes as EVM opcodes, some of the hash’s bytes didn’t have a corresponding opcode, and that’s why we were getting INVALIDs. In fact, a hash may just by chance produce any set of opcodes, which will make all hashes different and look weird when using tools like Remix. What I do is ignore the bytecode when I see a LOG1, followed by a PUSH6 and a few INVALIDs, understanding that what I am seeing is the metadata hash injection that the Solidity compiler makes.

And this concludes our analysis of this bizarre structure found at the end of every contract produced by Solidity.

Better yet, this concludes the entire series. Yay! If you followed along and digested this considerable amount of highly technical, horrendously boring material (at least to most people), then I salute you! I hope that by now you feel right at home when you see EVM bytecode in the wild, and that you add this skill to your toolset when analyzing and developing smart contracts for Ethereum.

Thanks for reading!

Deconstructing a Solidity Contract — Part I: Introduction ✔
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
Deconstructing a Solidity Contract — Part III: The Function Selector ✔
Deconstructing a Solidity Contract — Part IV: Function Wrappers ✔
Deconstructing a Solidity Contract — Part V: Function Bodies ✔
Deconstructing a Solidity Contract — Part VI: The Metadata Hash ✔

Deconstructing a Solidity Contract — Part VI: The Metadata Hash was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Transaction Permission Layer Protocol v1.0

0age — Fri, 07 Sep 2018 22:59:35 GMT

Introduction

The Transaction Permission Layer protocol (TPL) is a method for assigning metadata (herein referred to as “attributes”) to Ethereum addresses. These attributes then form the basis for designing systems that enforce permissions when performing certain transactions. Security token transfers are one such transaction, where compliance with various laws and regulations will compel tokens to only permit a transfer once a set of conditions (e.g. identity verification, KYC/AML, or other attributes) have been met. There are also other families of transactions that will benefit from a permissions system, e.g. voting in digital collectives, participating in communities that depend on reputation or resistance to Sybil attacks, and maintaining curated lists or registries.

In our last post, we outlined a potential approach to compliant token offerings that utilizes the TPL protocol to limit the pool of participants until a sufficient level of decentralization has been reached. Other approaches to verifying attributes on participant addresses tend to fall into two broad categories:

general-purpose identity systems, such as ERC-725/ERC-735, ERC-1077/ERC-1078, uPort, and others, and
specialized components of securities-token platforms, such as Harbor, Polymath, Securitize, Meridio, and similar organizations.

Many of the general-purpose identity proposals allow for multiple claimants to issue identical claims on a given address and topic. This means that each token that relies on those claims would have to track and weigh the relative reputation of each entity issuing the relevant claims and metadata. Ideally, token implementors would be able to get back one trusted result without needing to know about its exact source.

On the other hand, platform solutions run the risk of discouraging cross-compatibility and flexibility by creating “walled gardens” constructed to serve a particular use-case (that is, unless they utilize a protocol like TPL under the hood as a bridge). Furthermore, many implementors may find that custom-built identity proposals may prove too complex or inefficient when translated to fit their particular needs.

The following proposed architecture is intended to address these challenges and enables projects to simplify the task of performing compliant token transfers or other permissioned transactions by providing a singular, trusted source for attribute metadata via a standardized interface. To do so, it implements an easy-to-use, effective, and flexible transaction permission layer — the TPL protocol.

Technical overview

At the core of TPL is the jurisdiction — a single smart contract that links attributes to addresses. It implements an AttributeRegistry interface, where attributes are registered to addresses as a uint256 => uint256 key-value pair, defined as follows:

The four external functions that make up the Attribute Registry interface.

Permissioned tokens and other contracts can then use this interface to identify and confirm attributes recognized by a jurisdiction without needing to take on the additional technical overhead of managing attribute assignment and revocation themselves. (The implementor can then optionally enter into an agreement with a jurisdiction that places the burden of validating and revoking attributes onto the jurisdiction.) By way of example, a simple ERC20 token implementing a shared whitelist only needs to check the result of one call into the registry.

Example of a simple whitelisted transfer function that inherits from an existing ERC20 implementation.

In addition to the AttributeRegistry interface, jurisdictions also implement additional interfaces for setting and removing attributes, monitoring for relevant events, and getting more detailed information on attributes and their verifiers when needed.

There are four basic categories of actors or entities, explained in detail below, that interact with the jurisdiction: an owner, the validators, the pool of participants, and the implementors, such as permissioned tokens.

A broad overview of the categories of entity that interact with a jurisdiction.

Jurisdiction Owner

The creator of a jurisdiction may designate an initial controlling entity of their choosing, including a governance collective (DAO, multi-sig, etc) or a regular, externally-owned account. The controller is primarily responsible for defining attribute types that are recognized by the jurisdiction and for designating validators that can assign attributes of those types to addresses.

The owner may set and declare a variety of optional properties on those attribute types, including a human-readable description of the attribute, restrictions on whether participants can set or remove the attribute (useful for creating blacklists), requirements to pay fees or lock up funds in order to set the attribute (more on that later), and other properties.

Jurisdictions are also fully composable and cross-compatible with one another. To accomplish this, attribute types designate secondary sources, including other jurisdictions or any contract that implements an Attribute Registry interface, that will be queried whenever an attribute has not yet been set locally for a given address. This gives jurisdictions the ability to delegate authority to other jurisdictions on a targeted basis.

As a result, implementors do not have to call into multiple jurisdictions to get different attributes. Instead, they can petition their jurisdiction to add attribute types with the desired secondary sources, or even create their own jurisdiction that references all the sources they need.

Validators

The owner of the jurisdiction does not assign attributes directly. Instead, they designate validators, specializing in particular areas, to perform this task. Specific validators are then approved by the owner to issue any number of specific attribute types.

A depiction of validators, attribute types, and approvals for validators to issue attribute types.

Multiple validators can be approved to issue a given attribute type, but only a single attribute of a given type may be assigned to a specific address at a time (though addresses may contain any number of attributes of varying types), ensuring a canonical source of truth of the attribute’s state.

If a validator is removed by the owner, it cannot issue any more attributes and all the attributes it has issued become invalid. If it is reinstated, issued attributes become valid again. The same principle applies to attribute types and to attribute approvals. This highlights a particular design feature that differentiates the TPL protocol: attributes have a high bar to clear in order to remain valid, reinforcing the reliability of claims made by a jurisdiction.

Issuing Attributes

Validators have three ways of issuing attributes to participants. They can:

sign an off-chain attribute approval that the participant can relay to the network themselves using addAttribute,
sign an off-chain attribute approval that designates a third-party address, or operator, that can then relay the attribute to the network on behalf of the address being assigned an attribute using addAttributeFor , and
call into the jurisdiction and set the attribute directly on behalf of the participant using issueAttribute.

In order to issue attributes of a given type, validators must first be approved to issue them by the jurisdiction owner. If the attribute type requires fees or locked funds (called the stake), the validator must either provide the funds themselves or require that the participant or operator do so. They may then specify their own optional requirements on an attribute-by-attribute basis, including the attribute’s assigned value, a validator fee to be paid upon assignment, and any additional required stake.

A collection of addresses, values, and staked amounts for a particular validator & attribute type.

NOTE: Additional features of the TPL protocol outlined below, such as stake, fees, and operator access, are totally optional to use, and can be ignored in a nearly-transparent fashion by a jurisdiction if so desired.

Stake & transaction rebates

If a validator needs to revoke an attribute they have issued, and if that attribute has any funds staked, the validator will receive a transaction rebate from that stake, with the remainder being refunded to the entity that locked up the staked funds. The jurisdiction owner may also revoke any particular attribute and receive a transaction rebate if applicable.

The refund amount is calculated by multiplying a fixed gas amount (based on a lower bound on transaction gas usage after rebates, currently set to 37,700) by the gas price of the transaction, and is paid out to the externally-owned account that submits the transaction.

Furthermore, a validator may invalidate a signed attribute approval prior to its usage by providing a hash derived from the approval as well as the signature associated with that hash. This option will not pay out a transaction rebate, as the participant has not yet submitted a transaction and locked up funds. However, it should only be required in highly targeted circumstances, as any attribute approval invalidation at scale can be achieved by simply modifying the validator’s signing key.

Signed attribute approvals

While validators can always assign and revoke attributes directly, signing approvals off-chain and letting the participant (or an operator, designated by the validator) submit the transaction has a number of beneficial properties:

Validators do not have to pay the transaction fee themselves, and can even require that the attribute assigner include any required stake or fees right in the submitting transaction.
Participants (or operators) may decide when they want to add the attribute, enhancing privacy and saving on fees when attributes are not ultimately required.
Validators can easily modify the properties of the assigned attributes to reflect changing conditions, or even cater them to the participant in question.

Validators have an associated signing key that can be changed, which will invalidate any unused attribute approvals but leave existing attributes intact. To sign an attribute approval, a validator may call the following with appropriate arguments:

An example function for off-chain signing of attribute approvals (ES6 / web3.js v1.0)

Implementors

The TPL protocol is built with simplicity and efficiency in mind for ERC20, ERC721, ERC777, and ERC1400 tokens — for users and development teams alike. Tokens and other smart contracts that wish to interface with the jurisdiction should do the following:

first, in the token constructor, initialization function, or other setter function, define the external registry interface & contract address of the jurisdiction as well as the attribute IDs of the attributes they’re interested in;
next, in the transfer and transferFrom functions, check for the required attributes or attribute values as applicable before transferring the tokens. Approved operators (for example, tokens exchanged via 0x) do not necessarily need to have attributes assigned to them in order to operate transfers. That being said, tokens may still want to allow for “trap-door” attribute checks for proxies or other pass-through contracts.

Participants who are interested in transacting in projects that use TPL will need to go through the appropriate checks with an approved validator, which can be located via the jurisdiction’s methods or through more traditional channels. They or their designated operator then submit a signed attribute approval from the validator that sets the required attribute. Alternately, the validator may assign the attribute on their behalf.

Developers

All of our work on TPL is totally open-source; to dig in to the technical details, run tests, analyze contract efficiency and gas usage metrics, and test-drive the contracts, check out the TPL project repository on Github. Opening new issues and submitting pull requests is of course always welcome.

Get Involved

We encourage you to join the discussion on our Telegram Discussion Board, and to get in touch if you are interested in collaborating as an implementor, a validator, or a member of the governance collective in our pilot jurisdiction.

Thanks to Connor Spelliscy, Demi Brener, Santiago Palladino, Alejo Salles, Leo Arias, Nicolas Venturo, AJ Ostrow, Max Mersch, Louis Guthmann, Peter Kieltyka, and Sina Habibian for feedback and conversations which contributed to this post.

Transaction Permission Layer Protocol v1.0 was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Deconstructing a Solidity Contract — Part V: Function Bodies

Alejandro Santander — Thu, 20 Sep 2018 15:23:20 GMT

By Alejandro Santander in collaboration with Leo Arias

Image from www.snappygoat.com

Note: This article is part of a series. If you haven’t read the previous article, please have a look at it first. We’re deconstructing the EVM bytecode of a simple Solidity contract.

Deconstructing a Solidity Contract — Part I: Introduction ✔
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
Deconstructing a Solidity Contract — Part III: The Function Selector ✔
Deconstructing a Solidity Contract — Part IV: Function Wrappers ✔
Deconstructing a Solidity Contract — Part V: Function Bodies ⬅
Deconstructing a Solidity Contract — Part VI: The Metadata Hash

Hey there! We’ve come a long way, haven’t we? First, we understood the difference between a contract’s creation time and runtime bytecode; next, we understood how the entry point of execution from any call or transaction is routed to specific functions via the function selector; and finally, we saw how incoming transaction data is unpacked for a function to consume, and the data produced by a function is repacked for the user via function wrappers. In this section, we will (at last) look at the actual execution of a function, or what we’ve been calling so far a “function’s body”.

The function body is precisely what the function wrappers detour to, after unpacking the incoming calldata. By the time a function body is executed, the function’s arguments should be sitting comfortably in the stack (or in memory if the data is dynamic), anxious to be used. Let’s see this in action with the balanceOf(address) function. This function should receive an address and return the corresponding uint256 balance of such an address.

Let’s go back to Remix and compile and deploy the contract as we’ve done before, and then call the balanceOf function with the address you used to deploy the contract as the argument. This should return the number 10000, since it is what is initially assigned to whatever address deploys the contract in the constructor’s code, which we used when deploying the contract.

Right, now let’s debug the transaction.

The first thing you’ll notice is that the debugger placed us at instruction 252. If you look at the deconstruction diagram, in the wrappers’ blue section, you should see that the balanceOf function wrapper redirects flow at instruction 175 to the JUMPDEST instruction in 251. As we’ve seen multiple times before, Remix places us precisely at the point when a function’s body is about to be executed.

Figure 1. Function wrapper redirects execution to the function body (blue dashed line at instruction 175).

Figure 2. Function body execution, coming from the function’s wrapper (blue dashed line at instruction 251).

Now, if you look at the stack, you’ll notice that its topmost value is the address we called balanceOf with. The wrapper has done its job of unpacking the calldata correctly. So we’re ready to step through instructions 251 to 290, the body of the balanceOf function.

Instruction 252 pushes a 20-byte 0xffffffffffffffffffffffffffffffffffffffff value and uses the AND opcode to “mask” the 32-byte address into its correct type (remember, addresses in Ethereum are 20 bytes long while the stack operates in 32-byte words).

In instructions 274 to 278,

the bytecode will upload the address from the stack to memory. It needs it there for the upcoming SHA3 opcode. If you look in the Yellow Paper, the SHA3 opcode has two parameters: the position in memory to calculate the hash from, and the number of bytes to hash.

But why will the code be using a SHA3 opcode? This function wants to read from the balances mapping. More specifically, it wants to read the value mapped for the incoming address. If you recall how a mapping is laid out in storage, the hash of the concatenation of the variable’s slot — in this case 1, because balances is defined as the second variable (totalSupply_ is the first, at slot 0), with the actual key itself as the address — is the position in storage where the value we are looking for is stored. SHA3 will need both these values in memory to do its magic, and that is precisely what’s happening here.

So, we’ve got the address in memory, but now we need the slot in memory. And that’s what happens next between instructions 279 and 283.

The number 0x01 is stored at memory position 0x20. Now memory holds the address at the first word, memory position 0x00, and the slot at the second word, memory position 0x20. Yay! We’re ready to call SHA3.

And so it’s called between instructions 284 and 287.

By the time SHA3 is called in instruction 287, the stack contains 0x00 (start position for SHA3) and 0x40 (length for SHA3), which is basically telling the EVM to hash whatever is in memory in the first two 32-byte words. Thirty-two bytes in hex is 0x20, so 0x20 + 0x20 equals 0x40.

Now, SHA3 leaves the 32-byte hash in the stack, which is an awfully long hexadecimal number, considerably longer than an Ethereum address. This hash is the location in the contract’s storage where the balance of the address passed to balanceOf is stored. You can visualize this using the Storage completely loaded panel in Remix’s debugger. You should find a matching location in the second storage object.

What’s stored at this location? The number 10000, or 0x2710 in hex. At instruction 288, SLOAD takes the argument of where to read from storage (our hash) and pushes 0x2710 to the stack.

Finally, at instruction 289, SWAP1 resurfaces the function wrapper’s JUMPDEST location (0x70, or 112), and a JUMP in instruction 290 takes us back to the outgoing portion of the function wrapper, which will repack 0x2710 for returning it to the user.

I strongly advise you to go over the same debugging process we just did with balanceOf, this time with the totalSupply and transfer functions. The former is very straightforward, if not trivial, and the latter is considerably more complex but elementally made up of the same building blocks. The secret is understanding how values are read from mappings and written to mappings. There’s really not much more to it.

Now let’s go back to the big picture:

Figure 3. Function bodies after function wrappers.

As we’ve discussed before, the function bodies are all packed together right after the function wrappers. Execution flow jumps to them from the wrappers and returns to the wrappers after performing each function’s instructions.

If you look carefully at the diagram, there’s a chunk of code that comes after the function bodies called the “metadata hash.” This is a very simple structure that we’ll look at next in the final part of the series.

See you there!

Deconstructing a Solidity Contract — Part I: Introduction ✔
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
Deconstructing a Solidity Contract — Part III: The Function Selector ✔
Deconstructing a Solidity Contract — Part IV: Function Wrappers ✔
Deconstructing a Solidity Contract — Part V: Function Bodies ✔
Deconstructing a Solidity Contract — Part VI: The Metadata Hash

Deconstructing a Solidity Contract — Part V: Function Bodies was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Deconstructing a Solidity Contract — Part IV: Function Wrappers

Alejandro Santander — Wed, 12 Sep 2018 15:10:34 GMT

By Alejandro Santander in collaboration with Leo Arias.

Image from www.planetpaper.com

Note: This article is part of a series. If you haven’t read the previous article, please have a look at it first. We’re deconstructing the EVM bytecode of a simple Solidity contract.

Deconstructing a Solidity Contract — Part I: Introduction ✔
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
Deconstructing a Solidity Contract — Part III: The Function Selector ✔
Deconstructing a Solidity Contract — Part IV: Function Wrappers ⬅
Deconstructing a Solidity Contract — Part V: Function Bodies
Deconstructing a Solidity Contract — Part VI: The Metadata Hash

In the last article, we saw how the function selector acts as a hub or a switch of sorts in our BasicToken.sol contract. It sits at the entry point of a contract and redirects execution to the matched function of the contract the caller wants to run.

Figure 1. Redirection from the function selector, as seen in the deconstruction diagram.

If the totalSupply function is called, execution will be redirected to location 91, balanceOf to 130, and so on.

Now let’s start a new debugging session in Remix as we’ve done before, and call the totalSupply function again. Make sure to always expand the Instructions panel, where the heart of Remix’s debugger is. As we saw before, Remix will place you at instruction 246, where the function’s body is just about to be executed. Last time, we took the Transaction slider from this position back to instruction 0, because we wanted to study the contract’s entry point and how it got to the function’s entry point from there. This time, we’re also going back (sorry!), but to instruction 91 instead of instruction 0, because there’s this thing that Solidity uses to wrap a function’s body. Don’t worry, we’ll get to the function’s body soon enough in the next article. We’re almost there; your patience will be rewarded!

So, step back to instruction 91, which is where the function selector leaves us because the function id matched totalSupply (0x18160ddd). At this point, the stack should only contain the function’s id. Now let’s walk through the code from here.

Figure 2. The non-payable check structure.

Instructions 92 to 103 will basically revert if there is value (i.e., ether) involved in the transaction. Again, this is a very common structure injected by the Solidity compiler whenever a function isn’t payable. We saw this exact same thing being used in the constructor, in Part II of the series, and it was also a non-payable function. This “non-payable check” structure will check if CALLVALUE ISZERO, and if so, will jump to instruction 103 (0x67), skipping the REVERT opcode in instruction 102.

If you called totalSupply on Remix without setting any value, we’ll reach instruction 103. Instruction 104 cleans up a zero that was left over in the stack, and then 112 (hex 0x0070) and 245 (hex 0x00f5) are pushed to the stack. Execution immediately jumps to the latter location: instruction 245. Notice that the jump occurs at instruction 111, and previously it pushed 112, so it wouldn’t be outrageous to imagine that the code is about to go do something, somewhere, and then come back that is, it will remember where we left off (112), jump, and then return.

Figure 3. Function wrapper jumping into the function body (yellow dashed lines).

Let’s see if that’s actually what happens by stepping to that mysterious 245 location.

Figure 4. totalSupply function body.

If you step through 245 to 250, you’ll see that our theory was indeed correct. This “something” that the code detoured to is the actual function body, whose inner workings are not important to us right now. What’s important to the scope of this article is how the code reaches and leaves this “body-thing”, that is, how it wraps around it. It jumped into the body and out of the body. So, we see that the JUMP at 250 takes us back to 112, as we so cleverly predicted.

If this was Age of Empires, we should be hearing the Bronze Age fanfare by now. That’s right, we’re navigating through the bytecode using JUMP and JUMPI like crazy! Go ahead, play it — you deserve it.

Image from en0.forgeofempires.com

BUT! There’s something new this time around in the stack: the number 10000 (hex 0x2710) ✨. The function body we just traversed was kind enough to put it there for us. If you were a smart contract, what would you do now? (Please don’t say, “I’d run away with all the ether”!).

Remember that we’re calling the totalSupply function. Somehow, you need to get that value from the stack into a RETURN opcode, so that it can be returned to the user. And that’s exactly what the code does between the instructions 113 to 129, where there’s an actual RETURN opcode at the end.

Figure 5. An uint256 memory returner structure.

It will first read the current free memory pointer (instructions 113 to 116), and then copy the value that the body of the function placed in the stack to that free space (instructions 117 to 119), which ends up storing the number 10000 (hex 0x2710) in memory. See? we’re getting good at this! If you don’t believe me, just step through the opcodes in Remix. Sounds complex but it’s not.

Finally, the code will figure out the size of the data that needs to be returned. Let’s look at that next.

Figure 6. Memory pointer offset?

It first loads the memory pointer again, and compares it using a subtraction to the previous memory pointer in instructions 120 to 124, most likely in an attempt to calculate the size of the data to be returned. This value seems to be hardcoded anyway in instruction 125, which may seem redundant. It’s probably the result of the optimizer realizing that the return data size can be hardcoded to save some gas and, after applying the optimization, some residual opcodes are left behind.

This is a perfect example of strange bytecode that apparently does nothing relevant or seems redundant. It’s OK to ignore opcodes that don’t seem to be accomplishing anything, and learn to live with them (or “through them” let’s say) and simply move on. As you read more and more bytecode, you’ll start identifying the purpose of these generic, apparently hollow structures in sudden short bursts of enlightenment.

That’s enough esoteric nonsense for now. Let’s get our feet back on the ground.

Figure 7. Returning values to the user.

Instruction 125 will push the number 32 to the stack (hex 0x20) and add it to whatever our magical offset was, from our previous generic calculation (which was 0 anyway) swap the values around to match the order in which RETURN consumes its values …aaaand BOOM, the user has the totalSupply value returned.

OK. So, we saw how the code was routed from the function selector, into this wrapper structure that went into the function body, and out of the function body and then dealt with the translation of whatever the function body produced, and packed this data for returning it to the user. Well, shall we look at the other functions and see if we can observe a similar pattern in them as well?

If you’d like to take a break first, though, this would be the perfect time for it. What we’ll do next is simply reassure the structure we’ve just analyzed in the other two functions, with a little eye candy and a bit magic here and there along the way.

Coffee break!

So, that’s one function; two to go. Let’s look at thebalanceOf function next.

It ’s strongly advisable that you have a quick look around in the deconstruction diagram, to verify visually what just happened with totalSupply, and to get an idea of what we’re about to do with balanceOf.

The function selector should take us to instruction 130, which is balanceOf’s wrapper, and from there take us into the function’s body and out of it, packaging the return value for the user. However, if you notice in the diagram, the code does jump into the function’s body as expected, but it returns to totalSupply’s wrapper instead of it’s own. Why?

Figure 8. balanceOf’s blue wrapper jumps back to totalSupply’s yellow wrapper.

A tempting reason for this to happen could be that since totalSupply and balanceOf both return a uint256 value, the chunk of code that grabs a uint256 value from the stack and returns a uint256 via memory is identical, and could be reused. The Solidity compiler could be noticing that part of the code generated for these two wrappers is the same, and deciding to reuse the code to save on gas. Well, it actually does just that, and we wouldn’t be observing this if optimizations were not enabled when we compiled the contract. Let’s call this structure that’s being reused the “wrappers’ uint256 memory returner”. A nice exercise would be to compile the contract without optimizations and verify this yourself ;D

Remix time. Let’s start a new debugging session by using the same address that we used to deploy the contract and call thebalanceOf function. It should return the number 10000, since the creator of the token initially holds all the goodies. In the Debug area, step back to instruction 142, which is where the function selector left us this time.

Figure 9. balanceOf function wrapper.

At instruction 144, 112 (hex 0x0070) is pushed to the stack — which, not surprisingly, is the location of the “uint256 memory returner” structure we just saw. The code is about to jump off to balanceOf’s body, and it’s remembering where to jump back to after the body is executed.

However, the jump into the function body at instruction 175 doesn’t happen right away. Something is going on before it actually makes the jump, between instructions 147 and 172.

Figure 10. The calldata unpacker.

At instruction 147, a hexadecimal number with 40 f’s (20 bytes) is pushed to the stack, and then a 4. CALLDATALOAD is called with the 4 as an argument, which has the effect of reading the first word of data (32 bytes) from our calldata after the function id. If this sounds weird, then I’d recommend that you look at part III of the series, where we analyze how calldata works. This word is the argument we passed into the function call, which is the address whose balance we want to check in the call to balanceOf. This address is masked with the big 0xffffffffffffffffffffffffffffffffffffffff number for type checking/masking, and then the jump in instruction 175 is made to the function body targeted at instruction 251 (hex 0x00fb), with the address that was read from calldata, sitting comfortably in the stack and ready for use by the body.

And so, we witness that a function wrapper’s job is not only to redirect into a function’s body, and package whatever comes back from the body for the user, but also to package stuff coming from the user for the function’s body to consume. The function wrapper’s nature thus reveals itself to us in its full glory!

A function’s wrapper is an intermediary that unpacks the calldata for a function’s body to use, routes execution to it, and then repacks whatever comes back for the user. This wrapper structure is there for all functions that are part of the public interface of a contract in Solidity.

How this packing and unpacking is done, is something that is meticulously defined in Ethereum’s Application Binary Interface Specification, which specifies how incoming and outgoing arguments in function calls are encoded.

Now, let’s have a quick look at the 3 function wrappers altogether:

Figure 11. The function wrappers following after the function selector.

It’s easy to see, in a smart contract compiled by Solidity, that the big chunk of code that comes after the function selector is the function wrappers, one after the other. And yes, the actual function bodies is the next big chunk of code that comes after the wrappers, and after that there is this small peculiarity called the “metadata hash” that we’ll also see in a future post. But that’s it.

We’re beginning to see a grand architecture in EVM output produced by the Solidity compiler, and it’s slowly becoming less mysterious/chaotic. When analyzing a contract’s bytecode, you will quickly learn to first try to see where you are in terms of this grand structure before actually diving into the bytecode stepping details.

Figure 12. The grand structure: function selector, wrappers and bodies.

As we did in the previous parts of this series, we leave debugging of a call to the transfer function to you. You should see how the wrapper unpacks two values this time — the beneficiary’s _to address, and the _value transferred — sends that to the function’s body, and then grabs the body’s response and packs it back up for the user. Makes sense, right?

In the next part of the series we’ll finally look into the function bodies. Once we do that, there’s not really much else to do… Only a couple of details to cover, and we’re done. This divide-and-conquer strategy is really starting to give us dominion over the problem we set out to tackle at the start of the series, which seemed overwhelming at first but now is starting to become a pattern we can familiarize ourselves with. The next time you see an opcode, you will not be scared. You’ll look it in the eye and fiercely say “Oh yeah? You and what opcode army!” You’ll skim through any bytecode like a true Ethereum ninja.

Let’s look at the function bodies next.

Deconstructing a Solidity Contract — Part I: Introduction ✔
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime ✔
Deconstructing a Solidity Contract — Part III: The Function Selector ✔
Deconstructing a Solidity Contract — Part IV: Function Wrappers ✔
Deconstructing a Solidity Contract — Part V: Function Bodies
Deconstructing a Solidity Contract — Part VI: The Metadata Hash

Deconstructing a Solidity Contract — Part IV: Function Wrappers was originally published in Zeppelin Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.