Core services - the block log revolution is here!
Dude, where did my block go?
It's official now, you can configure your hived node to get rid of some of the 490+ gigabytes of compressed block log burden. You can even shed all of it, but do you really want to and what if you change your mind?
The time of no choice is over
Since the beginning of Hive blockchain (and its predecessor) there was that huge single file named block_log, mandatory for all nodes. A single file with a size of over 490 gigabytes now, requiring continuous disk space of its size. The block log revolution that comes into force with 1.27.7rc0
tag brings following improvements:
- Multiple one-million-block-each block log files can be used instead of a legacy single monolithic file.
- You can keep all of the files or only a number of most recent ones.
- Complete wipeout of block log is possible too, leaving you with one last block only kept in memory.
The pros and cons
Let's examine the new modes in detail:
- Split mode - keeps each full block from genesis, hence provides full functionality of e.g. block API & allows blockchain replay. At the same time its 1M-blocks part files of block log may be physically distributed to different filesystems using symlinks which allows to e.g. keep only the latest part files on fast storage. Good for API node.
- Pruned mode - a variation of split mode, which keeps only several latest part files of block log. Replay is no longer guaranteed. Provides only partial functionality of block API & others - handles requests for not-yet-pruned blocks, e.g. serves latest several months of blocks through
block_api
. Good for transaction broadcaster. - Memory-only mode - the ultimate pruning - no block log files at all, only single latest irreversible block held in memory. Unable to replay obviously. Unable to provide past blocks through block API & similar.
The summary of block log modes
mode name | blocks kept | replayable | the value of block-log-split option in config.ini |
---|---|---|---|
legacy | all | yes | -1 |
split | all | yes | 9999 (default value now) |
pruned | last n millions | sometimes | n > 0 |
no-file | last 1 | no | 0 |
Wait a minute, you may say, the split mode number (9999) meets the condition of pruned one (> 0), there must be a mistake here. Let me explain in detail then - positive value of block-log-split
option defines how many full millions of last irreversible blocks are to be kept in block log files. It means that when you set it to e.g. 90, all blocks will be kept for the time being, because Hive's got a little over 89 millions of blocks now. Thus for the time being the block log is not effectively pruned. After a while however, when the threshold of 90 millions is crossed, the file containing oldest (first) million of blocks will be pruned (deleted) and from that moment the block log will be effectively pruned. As you can see the boundary between split & pruned modes is blurred, but setting it to the biggest possible number (9999) means that your block log won't be pruned for the next 950+ years.
Now we're getting to the question why replay is available sometimes in pruned mode. Full replay (from block #1) requires all blocks to be present in block log, therefore it can be performed as long as block log is not effectively pruned due to combination of block-log-split
value in configuration and current head block of the blockchain. After the oldest part file containing initial 1 million blocks is removed, the block log is effectively pruned and full replay is no longer possible.
Tips & tricks
- There are two ways to obtain split block log files from legacy monolithic one - a) Using
block_log_util
's new--split
option or b) running hived configured to have split block log with legacy monolithic one provided in itsblockchain
directory, which triggers built-in auto-split mechanism. The former is recommended as it allows to generate the 490+ GB of split files into output directory other than the source one (possibly on different disk space). - All files of split/pruned block log, except the head one (the latest one, with highest number in filename) can be made read-only as they won't be modified anymore. The head file needs to be writable as it's where the new blocks are applied to.
- Split block log allows to scatter its part files over several disk spaces and symlink them all in hived's
blockchain
directory. Not only can smaller disk volumes be used, you can even consider placing older parts (i.e. the ones rarely used by hived) onto slower drives. - The names of split/pruned block log files follow the pattern
block_log_part.????
where????
stands for consecutive numbers beginning with0001
followed by0002
, etc. Since each one contains up to a million of blocks,block_log_part.0001
contains blocks numbered1
to1 000 000
, whileblock_log_part.0002
contains blocks numbered1 000 001
to2 000 000
and so on. Hived recognizes the block log files by their names, so don't change them or it becomes lost.
Links and resources
- Source code version containing the improvements - https://gitlab.syncad.com/hive/hive/-/tags/1.27.7rc0
Awesome feature, especially for witnesses, because usually they are running tons of various Hive nodes to serve the community.
{(Amidala meme: "Because they are running tons of various Hive nodes to serve the community???")}
For example, I can run a few broadcaster nodes on cheap VPS servers, as they no longer need a huge amount of storage. It will also improve my block_log serving service, as it will be much easier to resume downloads, even if you had a different source of blocks before (the blocks are the same, but because of the compression, their storage can differ between the nodes).
Heh. I see I'm not the only one who takes a month to write a post :oP
Technically the information is correct, but it is worth pointing out explicitly that even if you are missing oldest block_log parts, you can still replay as long as you have valid snapshot that covers missing blocks. Although personally I'd keep all parts somewhere, because snapshots are easily outdated.
Congratulations @thebeedevs! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)
Your next target is to reach 4750 upvotes.
You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOP
Check out our last posts:
!PIZZA
$PIZZA slices delivered:
@danzocal(7/10) tipped @thebeedevs
This is indeed an exciting feature.
In this case, I think we can run a full node locally, and when necessary, such as upgrades or hard forks, we can replay from the beginning. Then run lightweight nodes on the server (bare metal or VPS), which can meet the needs and save disk space costs.
This is going to be great! Reducing the cost of running nodes is going to get more participants! Great work.
Hello thebeedevs!
It's nice to let you know that your article won 🥇 place.
Your post is among the best articles voted 7 days ago by the @hive-lu | King Lucoin Curator by szejq
You and your curator receive 0.0587 Lu (Lucoin) investment token and a 15.51% share of the reward from Daily Report 477. Additionally, you can also receive a unique LUGOLD token for taking 1st place. All you need to do is reblog this report of the day with your winnings.
Buy Lu on the Hive-Engine exchange | World of Lu created by @szejq
STOP
or to resume write a wordSTART