memory management - How do we design an on-disk data structure and how do we save it? -

June 15, 2015

how design on-disk data structure? ex: @ ext3 inode structure, have attribute placement. things kept in mind in terms of memory usage/alignment, padding etc.?

also, there special mechanism write these data structures disk in terms of page/block alignment or write file write?

normally, handling of kind of structures done in c because of relative low-level characteristics. example, when define struct can know in advance how many bytes take , alignment of each field. c never introduce in struct control or other kind of fields own use. allows read , write structs directly , disk.

the link included in question seems correspond linux ext2 (extension 2) filesystem, save 2 things:

it says correspond ext3. however, not show of ext3-specific structures added ext2. understanding of ext2 should enough current purposes. recommend read wikipedia's article on ext2 or similar.
it shows last entry of block (pointer) array single indirection indirection one, , doesn't mention double or triple indirection entries. in ext2 (and ext3), block (pointer) array contains 15 entries. first 12 direct, number 13 single indirection, number 14 double indirection, , number 15 triple. simplification have been done depiction only, or simplification of whole work doing.

these structures origin of concept of file. can't read them files because there no files yet. need interact "block device driver" representing hard drive. many other linux objects, devices (and implicitly drivers) represented elements in filesystem (no, not in your filesystem doesn't exist yet, in root filesystem linux machine has upon startup). need open corresponding filesystem object , use ioctl function send requests it.

in order understand alignment, it's important differentiate 2 things:

the input/output unit (sector), basic amount of bytes send/receive device. normally, it's 512 bytes.
the allocation unit (block), basic amount of bytes assign files in filesystem. 1, 2, 4, 8, 16 sectors or power of 2 (but same amount blocks in disk).

thus, disk divided in blocks, assigned files hold contents. when file created size 0, doesn't have block. when first byte written, first block assigned. when second byte written, no new block assigned because first block should still have lot of space available (say 4,095 bytes). when byte number 4,096 written, still fits in first block, when byte 4,097 written, second block assigned file , new byte written in first position. , on. on average, half of block wasted per each file, in last block.

the elementary parts of on-disk data structures should of size 512 bytes, can read , written no waste @ all. of course, multiple instances of elementary parts can stored contiguously, elementary information should not span 1 sector next, contained in 1 of them. more 512-byte alignment, 512-byte size requirement.

however, these sectors read ram, , efficiency reasons should not subject manipulation before use. on other hand, many cpu architectures operate faster on shorts, of size 2 bytes, when alignment 2 (i.e. memory address multiple of 2), on ints, 4 bytes, when alignment 4, , on, size 16. if on-disk data items obey these alignments, , sectors stored read memory block biggest possible required alignment (16 safe), data items correctly aligned in memory too.

in mode of summary:

make basic elements of on-disk data structures 1 sector big (even if repetitions of them going stored contiguously).
align data items within sectors/basic elements according size, and
read on-disk data structures (and perhaps sectors) in 16-byte-aligned memory blocks (or whatever biggest elementary data item is).

finally, answer questions expressed in different comments:

now mention java, first time understand why it's important distinction between "on-disk" , "in-memory" data structures. see below.
in java don't have control on object size, or data item alignment. if want comply criteria established above, every time want write java ds disk, need encode byte[512], , write this. every time read ds disk, need read byte[512] , decode corresponding java ds. violates principle of no data transformation between in-memory , in-disk data structures.
also, in java don't have access devices. can emulate block devices through files, though.
you can call "on-disk data structures" byte[512] data elements read/written disk, , source/resulting disk sectors (emulated or physical) themselves.
- in vein, can call "in-memory data structures" java objects encoded byte[512]s , written disk, or decoded byte[512]s have been read disk.
correct, simple byte[512], "on-disk" data structures "pure data".
and yes, java objects "in-memory" data structures belong classes, in turn contain functions, constants, data types/auxiliary classes, etc.
for more details, please indicate if ext2 implemented publicly specified, or general model of define details.
inode organized table (with horizontal , vertical components) aesthetic reasons only. think of unidimensional (horizontal or vertical) data only.

Search This Blog

Detect

memory management - How do we design an on-disk data structure and how do we save it? -

Comments

Post a Comment

Popular posts from this blog

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -

c++ - importing crypto++ in QT application and occurring linker errors? -