Throughout the last 2+ years, the ecosystem has been actively looking at various ways and means to help scale decentralised applications, especially on Ethereum. A variety of teams have been involved in this collaborative process, and it has been wonderful to see progress on many fronts.
The need for scaling has been acutely felt with solutions being proposed and being built that fall into one of the following categories:
- Bigger blocks in existing blockchains
- New Layer 1 blockchains including ETH 2.0
- Off-chain scaling solutions
- State channels
For this post, we will concentrate on off-chain scaling solutions.
The general design pattern for how off-chain scaling works is the following:
- Enable transactions to move off-chain
- Allow assets to move from one chain to another and vice versa; from Ethereum to a sidechain for example
- Execute transactions off-chain
- Submit snapshots of off-chain state to Ethereum
- Dispute resolution mechanism on Ethereum, in case off-chain txs go rogue – except zkRollups which bypass this
- An assumption on data availability of the off-chain data to interested “challengers” – for some classes of off-chain scaling solutions
Do note that all off-chain scaling solutions are primarily different design approaches that play on efficient ways and means to derive scaling benefits.
Essentially, every solution needs to make tradeoffs on one or the other parameter – there are no magical solutions. Any solution that claims to resolve everything should be taken with a pinch of salt.
At Matic Network for example, early on, we took a call that we will maintain a hybrid-solution – one that offered support for an account-based implementation of Plasma as well as a Proof-of-Stake (PoS) chain backed by validators staking the MATIC token. For developers who wish to use the same contracts on a scalable chain, they can directly use the Matic chain via PoS security guarantees, but developers can also choose to use Plasma predicates – such as the ones for ERC20 and ERC721 transfers, asset swaps (for DEXs and NFT marketplaces) and an increasing number of pre-built predicates – to make use of Plasma security guarantees.
This was done primarily so that developers have one consistent and well known way of developing applications via PoS – i.e. full Solidity support and experience similar to Ethereum, while still having the ability to try out and integrate techniques such as Plasma and OR (optimistic rollups) in certain parts of their application so that they have best-in-class security.
We expressly believe in considering the tradeoffs inherent in any solution, and then try to implement the best solution possible, in what is available or known, at a current point in time.
Optimistic Rollups or ZkRollups are also an addition to the library of scaling techniques for Ethereum – but they are also subject to tradeoffs, albeit alternative ones.
Let’s dig into the details.
We will not be getting too deep into what Plasma is in this article- this is better explained by dedicated resources such as http://plasma.io/, https://ethresear.ch/t/minimal-viable-plasma/426, https://ethresear.ch/t/more-viable-plasma/2160 and https://medium.com/plasma-group/plasma-spec-9d98d0f2fccf.
There are a number of different implementations of Plasma, but all of them, more or less, try to attack the same problems that Plasma or any off-chain solution targets.
Plasma provides a way to conduct transactions off-chain at a much higher and cheaper rate. The single-operator model can be used in Plasma, because you have the requisite fraud proof verification mechanism in conjunction with periodic snapshots/checkpoints of the sidechain state on Ethereum, and this enables optimistic execution of transactions on the sidechain made possible.
One major contrast of Plasma with a regular sidechain, is that things like asset ownership and other state can be secured by the Plasma contract on Ethereum, and therefore can survive an attack on the sidechain. In simple terms, if the sidechain goes down or the operator goes rogue, users are still able to get their assets back on the main chain.
There is also a good mechanism to prevent users falsifying their state ownership, and ways and means to dispute that. Overall, the framework is quite good, but does have limitations, yes.
The limitations, as we have learnt by implementing and iterating, is that there is a real cap, when we need to fallback to the Ethereum main chain, in scenarios such as mass exits. Mass exits are events during which the sidechain goes rogue, and the users need to prove ownership of assets/state on the main chain by submitting proofs on the Plasma contracts.
Before getting into the mass exit perspective, let’s also understand the security model of the operator.
If one implements the Plasma operator as a single entity alone, then the potential for the operator to attempt fraud is higher relative to the operator being replaced by an incentivized Proof-of-Stake validators. That’s precisely why Matic has implemented a hybrid Plasma+POS architecture which allays the data availability attack vector to a large extent.
The design is optimistic in approach, which means that most of the time, the sidechain will run transactions flawlessly, and the mass-exit scenario occurs only when more than ⅔ (>66%) validators collude to take the Proof-of-Stake security mechanism down. This attack scenario is rare and is generally the fundamental assumption behind any BFT-based PoS blockchain e.g Cosmos.
It’s important to demonstrate that the mass exit mechanism is a fallback mechanism and comes into picture only a worst-case scenario. Of course, there are griefing attacks possible – but these are some assumptions that we can choose to begin with.
The primary contention about mass exits in Plasma is that if there is a sufficiently large number of users performing mass exit, it can lead to congestion on mainchain (Ethereum) and users may not be able to exit in time. Sure, mass exits can be big in size, but how big?
Let’s use an example for illustration.
- Number of users on the sidechain, u = 100,000,
- Gas required for MoreVP exit, g = 700,000,
- y is the fraction of Ethereum block filled for mass exits alone. 0 < y <= 1
- Roughly for 100k users, you would need 8750/y blocks on Ethereum to execute the mass exit.
Let’s assume 25% of the block is filled, this would take 8750*4 = 35,000 blocks or 8750 minutes or ~145 hours or ~6 days for users to mass-exit the chain.
If we assume 100% of the Ethereum block is filled, similar to calculations seen in comparison articles, this will reduce to ~1.5 days. Note I don’t think that is a good way to calculate it, but let’s keep it here for the sake of completeness. So ~450k users can be supported on a single Plasma chain, and this number can go up once one introduces the notion of multiple Plasma chains.
Of course, this is a sample scenario and there are other factors affecting this as well.
But it helps to visualise the kind of tradeoffs that are present in implementing a real-world Plasma implementation.
Withdrawal period of assets/state to move back to Ethereum is defined by the challenge period that most Plasma implementations choose to keep as default. This period is generally kept as 1 week so users just need to watch the chain once per week and it’s good enough so that Ethereum block size is not a bottleneck (assuming a certain number of users). There can also be innovative ways via engineering or by the use of governance DAOs to dynamically increasing the withdrawal period durations should the number of withdrawals sees an unprecedented surge.
This is also the duration when users need to wait out before withdrawing their assets from the Plasma chain.
Of course, one can introduce a secondary market here, which allows users to swap their Plasma assets with Ethereum assets with a counterparty at a certain discount, and they can withdraw instantly as well.
The main assumption in a Plasma implementation is of data availability. As long as we have data availability guarantees, it is relatively easy to prove provenance of assets by users.
Users need to sync data from the sidechain to provide for challenges, and that has been the bone of contention of Plasma with single operators, and it is certainly an important problem. That data availability layer for rollups has been posited as Ethereum (specifically, calldata) in most circles, and that is a key insight and a limiting factor as well. It is definitely clever for sure!
One good example of how it can be solved/mitigated for Plasma is an incentivized Proof-of-Stake validator system, which is an approach Matic Network has been taking.
Fees are relatively straightforward to charge for, cheap and increase linearly with complexity of the transaction.
Plasma can work with a single operator, but can have relatively better data availability properties with a Proof-of-Stake based sidechain.
Most L2 solutions need watcher nodes to make sure that the sidechain is working as intended and to challenge frauds on the main-chain. While watcher nodes for OR’s (optimistic rollups) and ZKR’s (zkRollups) are comparatively easier to implement, one still needs to incentivise them to perform these actions. ZKSync by Matter Labs is a good example of how this can happen. Another example is Matic Network, which has a validator layer that is highly incentivised to make sure things work as intended.
Let’s move to Rollups…
Optimistic Rollups (OR’s)
Optimistic rollups snapshot the sidechain state on-chain just like a Plasma chain does, but with the exception of sending the transaction data that changed the state on-chain. OR chains utilize the censorship resistance of the Ethereum main chain and uses it for making the transaction data available forever.
Good things about OR are:
- Transaction data is always available on Ethereum
- Exit game is relatively simple due to data always being available
However, it is certainly not a magical solution.
Scalability is bounded by the capacity of the Ethereum blockchain. ‘calldata’ is certainly a cheaper mechanism to store transaction data, but there is a finite capacity that one can expect for data to be stored there.
For ERC20, transactions, the Transactions per second throughput (TPS) might sound juicy – ~450 TPS, and ~2000 TPS with aggregate signatures, but that number goes down quickly once the assumption is that 100% of Ethereum blocks must be filled to store OR data on the chain for this to happen. Also once we consider more complex transactions, fraud proofs also increase in complexity and are subject to the limitations of the execution environment of the main chain.
Fraud proofs still need to be written for OR, similar to Plasma. Specifying these easily is a challenge for both OR and Plasma – from the perspective of a normal developer, in terms of how easy it is to write them. A great way to look at this is from a developer perspective – if we can create tools to make fraud proofs easier to build for any approach, that approach will provide an easier learning curve.
The withdrawal period for assets on Ethereum for OR is ~1 week. This can be mitigated by introducing a secondary market for fast exits, which is true for Plasma withdrawals as well.
For sure, data availability for OR is superior to a Plasma chain, because the tx data is stored in Ethereum. However, as already explained, it is bound by how much data can be packed into Ethereum blocks.
An interesting offshoot is that going forward, we can use other solutions such as Bitcoin Cash or erasure-coding based data availability engines, or even a Proof-of-Stake sidechain for ensuring data availability. Even zkR solutions such as StarkPay will need a good data availability solution for the data stored off-chain.
Fees on OR based chains would also be quite high as more data is pushed on-chain.
In OR, the operator equivalent is the aggregator. An aggregator needs to have bonds in order for the system to function correctly. If an aggregator is not processing txs by a user, the user could post transactions to Ethereum directly as well. But the OR mechanism works better if there are multiple bonded aggregators to bypass censorship by a single aggregator.
If we think past the basic proof-of-concept implementation, any robust, production-ready OR implementation will need a robust incentivization mechanism for validators – something that we are beginning to see in rollups such as ZkSync by Matter Labs with staked validators or say on the lines of an incentivized Proof-of-Stake network of staked validators.
Coming to zkRollups, without getting into too much detail (you can refer to details here and here), the architecture is very similar to that of OR, but fraud proofs are replaced here by validity proofs.
However, validity proofs are expensive to compute, and the code required to specify the zk circuits are, at the moment, difficult to write and audit correctly. Due to this, transaction fees might and should also be higher. Of course, this will improve as we are seeing incredible advances in zk technology.
Another interesting aspect of utilizing zkRollups is if one doesn’t have enough user transactions, then the operator needs to fill the whole block capacity themselves with dummy transactions, because the zkSNARK would accept a fixed number of transactions, n. Say n=100, and if one has 10 transactions for the current block, then the operator needs to add 90 txs more to the rollup block or has to wait for more user transactions which would delay finality, just to complete the batch. This will end up making the fee cost 10x in this scenario.
Ultimately, it can be envisioned that zk-based approaches will be able to solve a lot of our problems in scaling, but at the moment, it also has a significant learning curve for the average developer in addition to expensive proof compute.
As for Matic Network, we are constantly striving to see how we can enable the most usable and secure means of scaling into our architecture.
Optimistic Rollups is something that we are also excited about, and are doing an internal Proof-of-Concept on. We have also been supporting bounties for OR implementations. In fact, our lead protocol dev, Vaibhav, also gave a workshop on ZkRollups at Devcon 5, Osaka last year. We are massively optimistic about a viable Layer 2 on top of Ethereum and believe that with there will be more than one way of achieving robust Layer 2 solutions, perhaps each suiting to its specific use cases.
While we do not agree that mass exits are solved, we consider Plasma as a really great solution to break the scalability ceiling and consider it worth the effort to keep on researching to solve these issues and bring the scalability everyone wants.
In a similar vein, Rollups are also a great new clever improvement – everyone in the industry needs to research on it and actually implement real-world production-grade implementations around it.
We are also keenly observing and experimenting with Rollups and once we strongly believe that there is a viable architecture for a production grade Rollup, we will also incorporate the same.
Matic’s ethos is scaling. The Matic approach is to study the best solutions out there, and help bring it to developers around the world. If that is Plasma, Rollups or something else, we are excited to study, discuss the pros and cons, and implement that for developers, keeping superior developer experience and tooling in mind.