second, it would grow by 1 MB per three seconds (1 GB per hour, 8 TB per year). Ethereum
is likely to suffer a similar growth pattern, worsened by the fact that there will be many
applications on top of the Ethereum blockchain instead of just a currency as is the case
with Bitcoin, but ameliorated by the fact that Ethereum full nodes need to store just the
state instead of the entire blockchain history.
The problem with such a large blockchain size is centralization risk. If the blockchain size
increases to, say, 100 TB, then the likely scenario would be that only a very small number
of large businesses would run full nodes, with all regular users using light SPV nodes. In
such a situation, there arises the potential concern that the full nodes could band together
and all agree to cheat in some profitable fashion (eg. change the block reward, give
themselves BTC). Light nodes would have no way of detecting this immediately. Of
course, at least one honest full node would likely exist, and after a few hours information
about the fraud would trickle out through channels like Reddit, but at that point it would be
too late: it would be up to the ordinary users to organize an effort to blacklist the given
blocks, a massive and likely infeasible coordination problem on a similar scale as that of
pulling off a successful 51% attack. In the case of Bitcoin, this is currently a problem, but
there exists a blockchain modification suggested by Peter Todd ↗ which will alleviate this
issue.
In the near term, Ethereum will use two additional strategies to cope with this problem.
First, because of the blockchain-based mining algorithms, at least every miner will be
forced to be a full node, creating a lower bound on the number of full nodes. Second and
more importantly, however, we will include an intermediate state tree root in the
blockchain after processing each transaction. Even if block validation is centralized, as
long as one honest verifying node exists, the centralization problem can be circumvented
via a verification protocol. If a miner publishes an invalid block, that block must either be
badly formatted, or the state S[n] is incorrect. Since S[0] is known to be correct, there
must be some first state S[i] that is incorrect where S[i-1] is correct. The verifying
node would provide the index i , along with a "proof of invalidity" consisting of the subset
of Patricia tree nodes needing to process APPLY(S[i-1],TX[i]) -> S[i] . Nodes
would be able to use those Patricia nodes to run that part of the computation, and see that
the S[i] generated does not match the S[i] provided.
Another, more sophisticated, attack would involve the malicious miners publishing
incomplete blocks, so the full information does not even exist to determine whether or not