More

mjb · 2026-02-12T22:29:21 1770935361

Cool article!

> This is why free -h on a Linux box can look alarming. You see almost no “free” memory, but most of it is “available” - and the page cache is using it.

And other buffers and stuff too. This is a great thing on bare metal, because it's a bet that the marginal cost of using an empty memory page is zero. This is true on bare metal, always. But in containers, or multi-tenant infrastructure, that isn't true anymore. That's where stuff like DAMON come in: https://www.kernel.org/doc/html/v5.17/vm/damon/index.html

In Aurora Serverless this kind of page cache management is a critical part of what the control plane does. Essentially we need to size the page cache to be big enough for great performance, but small enough not to cost the customer unnecessarily. We go into quite a lot of detail on that in our VLDB'25 paper: https://assets.amazon.science/ee/a4/41ff11374f2f865e5e24de11...

> Linux fills free memory with page cache on purpose. It’s a bet: if someone reads this block again, I already have it.

This works because most database workloads have great temporal and spatial locality. And it works well. But it's also one of the biggest practical issues people run into with relational databases in production: performance is great until it isn't. The shared buffers and page cache keep reads to near zero, but when the working set grows even a tiny bit bigger, then the rate of reads can go up super quickly.

This is why in both Aurora Serverless and Aurora DSQL we do buffer and cache sizing very dynamically, getting rid of this cliff for most workloads.

mjb · 2026-01-06T23:45:38 1767743138

Indeed. Collecting cameras, and talking about cameras, is a very different hobby from photography. That's OK! Both can be fun.

Inspired me to write this blog post: https://brooker.co.za/blog/2023/04/20/hobbies.html

mjb · 2026-01-06T23:44:14 1767743054

The haters will hate, but tap guides are great (e.g. https://biggatortools.com/v-tapguide-faqs, but even a block of hard wood with a clearance hole drilled in it works fine).

Unless you're tapping something super tough (306?), Amazon taps are fine for hand tapping. Go in straight, use a good lubricant.

analog31 · 2026-01-07T03:13:47 1767755627

I've got two of those tap guides, one for US and one for metric. They're great. Also for drilling since I don't have a drill press at home.

I've examined cheap taps under a microscope. Maybe they are of varying quality, but the ones I got had burrs all along the cutting edges. A tap that I borrowed from a machine shop was flawless in comparison. So maybe the middle ground is caveat emptor.

Another trick for tapping is to use something pointy in the drill chuck to center the tap after drilling, assuming you've clamped down your workpiece in a drill press or mill. This works for really big taps when you don't have a guide for them. Likewise the tailstock of a lathe can be used for this purpose.

aj7 · 2026-01-17T17:06:22 1768669582

All good advice above. I’ve tapped 1000’s of holes and haven’t broken a tap in 50 years, and I have nothing to add to the above. Berkeley Physics Dept. student machine shop training.

mjb · 2025-12-15T17:46:29 1765820789

I don't think either is a bad choice, but Aurora has some advantages if you're not a DB expert. Starting with Aurora Serverless:

- Aurora storage scales with your needs, meaning that you don't need to worry about running out of space as your data grows. - Aurora will auto-scale CPU and memory based on the needs of your application, within the bounds you set. It does this without any downtime, or even dropping connections. You don't have to worry about choosing the right CPU and memory up-front, and for most applications you can simply adjust your limits as you go. This is great for applications that are growing over time, or for applications with daily or weekly cycles of usage.

The other Aurora option is Aurora DSQL. The advantages of picking DSQL are:

- A generous free tier to get you going with development. - Scale-to-zero and scale-up, on storage, CPU, and memory. If you aren't sending any traffic to your database it costs you nothing (except storage), and you can scale up to millions of transactions per second with no changes. - No infrastructure to configure or manage, no updates, no thinking about replicas, etc. You don't have to understand CPU or memory ratios, think about software versions, think about primaries and secondaries, or any of that stuff. High availability, scaling of reads and writes, patching, etc is all built-in.

mjb · 2025-11-28T04:20:18 1764303618

I spoke about this exact thing at a conference (HPTS’19) a while back. This can work, but introduces modal behaviors into systems that make reasoning about availability very difficult and tends to cause meta stable behaviors and long outages.

The feedback loop is replicas slow -> traffic increases to primary -> primary slows -> replicas slow, etc. The only way out of this loop is to shed traffic.

mjb · 2025-11-28T04:12:38 1764303158

Practically, the difference in availability for typical internet connected application is very small. Partitions do happen, but in most cases its possible to route user traffic around them, given the paths that traffic tends to take into large-scale data center clusters (redundant, typically not the same paths as the cross-DC traffic). The remaining cases do exist, but are exceedingly rare in practice.

Note that I’m not saying that partitions don’t happen. They do! But in typical internet connected applications the cases where a significant proportion of clients is partitioned into the same partition as a minority of the database (i.e. the cases where AP actually improves availability) are very rare in practice.

For client devices and IoT, partitions off from the main internet are rare, and there local copies of data are a necessity.

mjb · 2025-11-28T04:08:57 1764302937

Yes, you can do stuff like that. You might enjoy the CRAQ paper by Terrace et al, which does something similar to what you are saying (in a very different setting, chain replication rather than DBs).

mjb · 2025-11-28T04:06:57 1764302817

(Op here) No deadlocks needed! There’s nothing about providing strong consistency (or even strong isolation) that requires deadlocks to be a thing. DSQL, for example, doesn’t have them*.

Event sourcing architectures can be great, but they also tend to be fairly complex (a lot of moving parts). The bigger practical problem is that they make it quite hard to offer clients ‘outside the architecture’ meaningful read-time guarantees stronger than a consistent prefix. That makes clients’ lives hard for the reasons I argue in the blog post.

I really like event-based architectures for things like observability, metering, reporting, and so on where clients can be very tolerant to seeing bounded stale data. For control planes, website backends, etc, I think strongly consistent DB architectures tend to be both simpler and offer a better customer experience.

* Ok, there’s one edge case in the cross-shard commit protocol where two committers can deadlock, which needs to be resolved by aborting one of them (the moral equivalent of WAIT-DIE). This never happens with single-shard transactions, and can’t be triggered by any SQL patterns.

lmm · 2025-11-28T06:04:33 1764309873

> There’s nothing about providing strong consistency (or even strong isolation) that requires deadlocks to be a thing. DSQL, for example, doesn’t have them*.

If you want to have the kind of consistency people expect (transactional) in this kind of environment, they're unavoidable, right? I see you have optimistic concurrency control, which, sure, but that then means read-modify-write won't work the way people expect (the initial read may be a phantom read if their transaction gets retried), and fundamentally there's no good option here, only different kinds of bad option.

> Event sourcing architectures can be great, but they also tend to be fairly complex (a lot of moving parts).

Disagree. I would say event sourcing architectures are a lot simpler than consistent architectures; indeed most consistent systems are built on top of something that looks rather like an event based architecture underneath (e.g. that's presumably how your optimistic concurrency control works).

> The bigger practical problem is that they make it quite hard to offer clients ‘outside the architecture’ meaningful read-time guarantees stronger than a consistent prefix. That makes clients’ lives hard for the reasons I argue in the blog post.

You can give them a consistent snapshot quite easily. What you can't give them is the ability to do in-place modification while maintaining consistency.

It makes clients' lives hard if they want to do the wrong thing. But the solution to that is to not let them do that thing! Yes, read-modify-write won't work well in an event-based architecture, but read-modify-write never works well in any architecture. You can paper over the cracks at an ever-increasing cost in performance and complexity and suppress most of the edge cases (but you'll never get rid of them entirely), or you can commit to not doing it at the design stage.

kukkeliskuu · 2025-11-28T14:20:33 1764339633

> You can give them a consistent snapshot quite easily.

How would you do that in a standard event sourcing system where data originates from multiple sources?

lmm · 2025-11-29T07:34:45 1764401685

You can have a system that puts out heartbeat timestamps and treat that as the single root upstream for all your inputs, or you can allow each source to timestamp its inputs and make a rule about how far your clocks are allowed to drift (which can be quite generous), or you can give every source an id and treat your timestamps as vectors and have a rule that any "combine events from source x and source y" must happen in a single canonical place.

mjb · 2025-11-28T01:10:38 1764292238

(OP here).

The point of that section, which maybe isn’t obvious enough, is to reflect on how eventually-consistent read replicas limit the options of the database system builder (rather than the application builder). If I’m building the transaction layer of a database, I want to have a bunch of options for where to send me reads, so I don’t have the send the whole read part of every RMW workloads to the single leader.

mjb · 2025-11-28T01:07:37 1764292057

(OP here). I don’t love leaking this kind of thing through the API. I think that, for most client/server shaped systems at least, we can offer guarantees like linearizability to all clients with few hard real-world trade-offs. That does require a very careful approach to designing the database, and especially to read scale-out (as you say) but it’s real and doable.

By pushing things like read-scale-out into the core database, and away from replicas and caches, we get to have stronger client and application guarantees with less architectural complexity. A great combination.