archive - Bzip2 block header: 1AY&SY -
this question bzip2 archive format. bzip2 archive consists of file header, 1 or more blocks , tail structure. blocks should start "1ay&sy", 6 bytes of bcd-encoded digits of pi number, 0x314159265359. according the source of bzip2:
/*-- 6-byte block header, value chosen arbitrarily 0x314159265359 :-). 32 bit value not give strong enough guarantee value not appear chance in compressed datastream. worst-case probability of event, 900k block, 2.0e-3 32 bits, 1.0e-5 40 bits , 4.0e-8 48 bits. compressed file of size 100gb -- 100000 blocks -- 48-bit marker do. nb: normal compression/ decompression *not* rely on these statistical properties. important when trying recover blocks damaged files. --*/
the question is: true, bzip2 archives have blocks start aligned byte boundary? mean archives created reference implementation of bzip2, bzip2-1.0.5+ utility.
i think bzip2 may parse stream not byte stream bit stream (the block encoded huffman, not byte-aligned design).
so, in other words: if grep -c 1ay&sy
greater (huffman may generate 1ay&sy inside block) or equal count of bzip2 blocks in file?
bzip2 looks @ bit stream.
from http://blastedbio.blogspot.com/2011/11/random-access-to-bzip2.html:
anyway, important bits bzip2 file contains 1 or more "streams", byte aligned, each containing 1 (zero?) or more "blocks", not byte aligned, followed end of stream marker (the 6 bytes 0x177245385090 square root of pi binary coded decimal (bcd), 4 byte checksum, , empty bits byte alignment).
the bzip2 wikipedia article alludes bit-block alignment (see file format section), seems inline remember school (had implement algorithm...).
Comments
Post a Comment