What does object-based mean? How is an object different from a file (which I pre...

tw04 · on March 7, 2020

It means you access your object through a GUID. Think about it like parking your own car vs. a valet. When you park your own car you need to know the address of the garage you parked in, the floor you were on, and the spot you were in. When you valet park, you hand the attendant a ticket and he brings your car back.

With a standard fileshare, you need to walk the filesystem to retrieve your file - this incurs a ton of metadata overhead. It also means when you've got potentially billions of files in a directory, it can be slooooowww. All the metadata requests also make it very chatty - so doing it over a WAN link tends to be extremely painful if it works at all. Newer versions of SMB and NFS have done a lot to batch the metadata requests but they are still protocols meant to happen at extremely low latency inside a datacenter.

toolslive · on March 7, 2020

Some object stores do this, but aws S3 for example does not. You can list the contents of buckets, nicely sorted by name. You can mimic directory structures if you want.

However, you touched a key point: object stores are all about throughput, not latency. You can store at a GB/s (if you have the pipes), but even checking if an object exists will cost you a few milliseconds.

statictype · on March 7, 2020

Got it. Thanks for explaining.

tgsovlerkhgsel · on March 7, 2020

My guess: No random write access to objects, you can (at best) append-only but often you can only append until the object is finalized, and cannot read it until it is finalized.

toolslive · on March 7, 2020

it's like a key-value store, (or a dictionary). However, the values are objects (big blobs of data). This means you can't update parts of objects without rewriting the blob. However, most of the object stores offer metadata operations (move, tag, ...), concat of n objects into 1 and partial reads.