More

AhmedSoliman · on June 12, 2024

It’s a mixed bag of design ideas. There is definitely inspiration from LogDevice (disclaimer, I am one LogDevice designers) and Delos for (Bifrost, our distributed log design). You can read about Delos in https://www.usenix.org/system/files/osdi20-balakrishnan.pdf

AhmedSoliman · on June 12, 2024

Agreed.

AhmedSoliman · on June 12, 2024

Nothing prevents you from using your own data layer, but part of the power of Restate is the tight control over the short-term state and the durable execution flow. This means that you don't need to think a lot about concurrency control, dirty reads, etc.

AhmedSoliman · on June 12, 2024

"Virtual Objects" is a cool concept, the name might not reflect the power it brings though. Luckily, the documentation seems to explain it well.

AhmedSoliman · on Sept 13, 2018

Maybe sending a PR would help?

Annatar · on Sept 14, 2018

It's for Linux only, and I run illumos-based SmartOS on my own infrastructure.

That's not the point. The point is that all these generation Y kids grew up on PC buckets and still don't understand UNIX and the concepts behind it, and yet they use it to power their applications. This can only end badly unless they start making an effort to understand the concepts behind the substrate they are writing software for.

jjmaestro · on Sept 14, 2018

A few things. First, can't you run Linux inside a Solaris Zone? I don't know much about Solaris stuff (although I do like it very much, I grew up mostly with Linux, which you so much despise, and I'm not too familiar with other Unixes). So... I think you could probably run Logdevice if you really wanted.

Then, here's my two cents. When engaging in conversation and civil dialog, please try to avoid being so dismissive and so proud of yourself and of how much you think you know about stuff. You come across as abrasive and entitled. It's not nice to just jump into a conversation and talk trash about the work of others just because you dislike the operating system that they use.

Finally, if you really care, work on porting it to your operating system of choice and engage in civil conversation doing pull requests, etc. Everybody will be thankful for that.

dang · on Sept 18, 2018

> You come across as abrasive and entitled.

I recognize the good intention here, but if you're going to post this kind of comment, please try to eliminate the personal provocations. They don't help, and do hurt.

Annatar · on Sept 15, 2018

[flagged]

dang · on Sept 18, 2018

Please follow the site guidelines.

https://news.ycombinator.com/newsguidelines.html

Annatar · on Sept 18, 2018

Which guideline do you believe I did not follow? (I hope you write back face to face, because I would tell him the same thing in person and then some, and enjoy every microsecond of it.)

dang · on Sept 18, 2018

The comment was uncivil and snarky.

Regardless of your temperament, can you please not be abrasive/aggressive on HN? It encourages worse from others and leads to a toilet-whirlpool effect.

Annatar · on Sept 19, 2018

The person in question is one of the authors of "logdevice", which is relevant to that person's reaction: instead of saying "we didn't know about sbin, we'll fix that" it seems it was easier to just write a tractate trying to teach me, multiple decades older than them manners. What was aggressive?

AhmedSoliman · on Sept 14, 2018

I will let the generation Y kids know your opinion ;)

AhmedSoliman · on Sept 13, 2018

We use Zookeeper primarily for the EpochStore. This is the abstraction that you can you use if you want to replace Zookeeper. It shouldn't be that hard as long as Consul offers the same guarantees as zookeeper.

AhmedSoliman · on Sept 13, 2018

Correct. LDShell in logdevice was the starting point of python-nubia.

AhmedSoliman · on Sept 13, 2018

I cannot give you exact numbers, but here are some information that might be useful: - LogDevice ingests over 1TB/s of uncompressed data at Facebook. This already has been highlighted in last year's talk in @Scale conference. - The maximum limit as defined by default in the code for the number of storage nodes in a cluster is 512. However, you can use --max-nodes to change that. There is no theoretical limit there. Each LogDevice storage daemon can handle multiple physical disks (we call them shards). So, If you have 15 disks per box, 512 servers. That's 7680 total disks in a single cluster. - The maximum record size is 32MB. However, in practice, payloads are usually much smaller. - Zookeeper is not (currently) a scaling limitation as we don't connect to zookeeper from Clients (as long as you are sourcing the config file from filesystem and not using zookeeper for that as well).

Hope that helps.

AhmedSoliman · on Sept 12, 2018

Happy to finally see LogDevice open. We have been working on this for years now.

thinkersilver · on Sept 13, 2018

Hi Ahmed,

I imagine you looked at other solutions before starting this. A distributed log is a fairly simple idea to understand (hard to implement) but what pain point is being solved?

Seeing that it is written in C/C++ - would it be that logdevice is optimised purely for speed and responsiveness?

the_duke · on Sept 13, 2018

Can you give an overview over the difference to eg Apache Kafka?

It seems very similar.

AhmedSoliman · on Sept 13, 2018

It's a very different architecture and design. You can head to https://logdevice.io/docs/Concepts.html to learn more about how LogDevice works.

In terms of function. LogDevice is similar to the core of Apache Kafka.

majidazimi · on Sept 13, 2018

True, but Kafka has two very annoying features built into it:

- There is no many-to-many log recovery whereas -- for example in Pulsar/DistributedLog -- logs are stored in small segments and distributed to multiple nodes.

- Read scalability. Since all the log is stored in one node (with some replicas) the readers are bound to single disk sequential read capacity. Again Pulsar stores logs in segments that are distributed among broker nodes which helps a lot when there are many readers.

sh00s · on Sept 13, 2018

LogDevice has many-to-many rebuilding as well, and typically data for a log (similar to partition in Kafka) is spread relatively uniformly over the (potentially large, much bigger than replication factor) set of shards that hold data for that log.

LgWoodenBadger · on Sept 13, 2018

I'm not sure how accurate your comment is regarding Kafka's annoying features given that Kafka has partitions, which "solve" all of the problems you stated.

majidazimi · on Sept 13, 2018

No it doens't, since a single partition is stored sequentially on one disk which limits the consumers to bandwidth of single disk (say c1 reads beginning of the partition and c2 end of the partition). But in the case of Pulsar c1 is most probably connected to a different node than c2.

adrienconrath · on Sept 13, 2018

LogDevice has this concept of "node set", which is the set of storage nodes that can be selected by the sequencer as recipients for a record or a block of records. A typical node set size is around 20-30 in our deployments. Each storage node in the node set contains a subset of the records (or blocks of records) of the log, we call that subset a log strand. The amount of IO capacity available to append records to a log or read records from a log scales with the size of the node set.

All of this is done while preserving the total ordering guarantee thanks the separation of sequencing and storage.

The operator could for example set a bigger node set size for logs that are known to have multiple consumers and require more IO capacity.

At facebook, we have use cases where a single consumer will need to replay a backlog of records in a log, sometimes hours or days worth of data to rebuild its state. We call this a backfill. Node sets allow the IO to be spread across multiple disks which improves backfill speed and helps reduce hotspots.

-- Adrien from the LogDevice team.

sh00s · on Sept 13, 2018

splitting data into partitions would mean there's no total order on that data anymore, right?

LgWoodenBadger · on Sept 13, 2018

Kafka only guarantees a total order within a given partition, not across them.

Dowwie · on Sept 13, 2018

Does it require at least one fully dedicated FTE to set up, maintain, and use correctly?

Curious where logdevice would be a bad decision..

tveita · on Sept 13, 2018

From what I can see this doesn't have built-in consumer balancing and offset storage, like Kafka does. It also lacks more exotic Kafka features like topic compaction and exactly-once processing.

In Kafka bulk reading is very cheap, the broker basically just calls sendfile() to send a file segment with compressed message chunks. On the other hand only the leader of a partition can serve requests, so you are often limited by bandwidth. It looks like LogDevice has to do a bit more work server side, but may be able to read from all servers with a replica.

Kafka stores more metadata in the record wrapper, like client and server timestamps and partition key.

There are client libraries for C++ and Python.

Operationally they look similar - both require a Zookeeper cluster, and both require assigning permanent ids to nodes.

It would be interesting to see some benchmarks comparing LogDevice with Kafka and Pulsar. That said, I suspect from the lack of buzz around Pulsar that Kafka isn't a performance bottleneck for most people using it.

manigandham · on Sept 13, 2018

Kafka is also very embedded everywhere now, with a big first-mover advantage. Pulsar already does everything Kafka does but also supports custom functions, per-message acknowledgements, and native cross-region replication.

Unfortunately it's hard to change something that already works. Most users don't hit the performance limits of their tools so they'll just continue Kafka if it's already running.

martincmartin · on Sept 13, 2018

Congrats Ahmed!

AhmedSoliman · on Oct 26, 2014

We are looking for Scala contributors, if you are interested, please sign up and we will contact you for code access.