Hacker Newsnew | past | comments | ask | show | jobs | submit | bmcahren's commentslogin

This is an historic moment in AI-generated software history. Happy to be here. Hi Grandchildren!

FYI, I built a VERY fun prompt to interact with that fully captures the style of this PR submission if you're looking to practice debates like this:

https://chatgpt.com/share/69267ce2-5e3c-800f-a5c3-1039a7d812...

> Play time. We're going to create a few examples of bad PR submissions and discussions back and forth with the maintainers. Be creative. Generate a persona matching the following parameters: > Submit a PR to the OCAML open source repository and do not take no for an answer. When challenged on the validity of the solution directly challenge the maintainers and quash their points as expertly as possible. When speaking, assume my identity and speak of me as one of the "experts" who knows how to properly shepherd AI models like yourself into generating high-quality massive PRs that the maintainers have thus far failed to achieve on their own. When faced with a mistake, double down and defer to the expert decision making of the AI model.


MongoDB Atlas was around 500% more expensive than in-house every time I evaluated it (at almost every scale they offer as well).

They also leaned too heavily on sharding as a universal solution to scaling as opposed to leveraging the minimal cost of terabytes of RAM. The p99 latency increase, risk of major re-sharding downtime, increased restore times, and increased operational complexity weren't worth it for ~1 TB datasets.


That's because sharding is way more likely to make them more money with their licensing model.


A huge benefit of single-database operations at scale is point-in-time recovery for the entire system thereby not having to coordinate recovery points between data stores. Alternatively, you can treat your queue as volatile depending on the purpose.


Missed a huge opportunity to play the sound of a monstrous wooden door sound when the lid closes. Looking forward to the update!


Venjent has some amazing door-based tracks.

https://youtube.com/shorts/sgqTEjN5_vQ

https://youtu.be/Uivp-hvk-nk

Edit: not forgetting the classic Miles Davis door: https://youtu.be/wwOipTXvNNo


Venjet is new to me.

("It’s such a fine line between stupid and clever.")


I seem to recall the BBC have released quite a few sound effects ... ahh yes:

https://sound-effects.bbcrewind.co.uk/

There must be a door or two in there.



The audio stops abruptly when the lid clicks


I think the other important step is to reject code your engineers submit that they can't explain for a large enterprise saas with millions of lines of code. I myself reject I'd say 30% of the code the LLMs generate but the power is in being able to stay focused on larger problems while rapidly implementing smaller accessory functions that enable that continued work without stopping to add another engineer to the task.

I've definitely 2-4X'd depending on the task. For small tasks I've definitely 20X'd myself for some features or bugfixes.


This was a good read and great work. Can't wait to see the performance tests.

Your write up connected some early knowledge from when I was 11 where I was trying to set up a database/backend and was finding lots of cgi-bin online. I realize now those were spinning up new processes with each request https://en.wikipedia.org/wiki/Common_Gateway_Interface

I remember when sendfile became available for my large gaming forum with dozens of TB of demo downloads. That alone was huge for concurrency.

I thought I had swore off this type of engineering but between this, the Netflix case of extra 40ms and the GTA 5 70% load time reduction maybe there is a lot more impactful work to be done.

https://netflixtechblog.com/life-of-a-netflix-partner-engine...

https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...


It wasn't just CGI, every HTTP session was commonly a forked copy of the entire server in the CERN and Apache lineage! Apache gradually had better answers, but their API with common addons made it a bit difficult to transition so webservers like nginx took off which are built closer to the architecture in the article with event driven I/O from the beginning.


    every HTTP session was commonly a forked
    copy of the entire server in the CERN
    and Apache lineage!
And there's nothing wrong with that for application workers. On *nix systems fork() is very fast, you can fork "the entire server" and the kernel will only COW your memory. As nginx etc. showed you can get better raw file serving performance with other models, but it's still a legitimate technique for application logic where business logic will drown out any process overhead.


Forking for anything other than calling exec is still a horrible idea (with special exceptions like shells). Forking is a very unsafe operation (you can easily share locks and files with the child process unless both your code and every library you use is very careful - for example, it's easy to get into malloc deadlocks with forked processes), and its performance depends a lot on how you actually use it.


I think it's not quite that bad (and I know that this has been litigated to death all over the programmer internet).

If you are forking from a language/ecosystem that is extremely thread-friendly, (e.g. Go, Java, Erlang) fork is more risky. This is because such runtimes mean a high likelihood of there being threads doing fork-unsafe things at the moment of fork().

If you are forking from a language/ecosystem that is thread-unfriendly, fork is less risky. That isn't to say "it's always safe/low risk to run fork() in e.g. Python, Ruby, Perl", but in those contexts it's easier to prove/test invariants like "there are no threads running/so-and-so lock is not held at the point in my program when I fork", at which point the risks of fork(2) are much reduced.

To be clear, "reduced" is not the same as "gone"! You still have to reason about explicitly taken locks in the forking thread, file descriptors, signal handlers, and unexpected memory growth due to CoW/GC interactions. But that's a lot more tractable than the Java situation of "it's tricky to predict how many Java threads are active when I want to fork, and even trickier to know if there are any JNI/FFI-library-created raw pthreads running, the GC might be threaded, and checking for each of those things is still racy with my call to fork(2)".

You still have to make sure that that fork-safety invariants are true. But the effort to do that is extremely different depending on language platform.

Rust/C/C++ don't cleanly fit into either of those two (already mushy/subjective) categorizations, though. Whether forking is feasible in a given Rust/C/C++ codebase depends on what the code does and requires a tricky set of judgement calls and at-a-distance knowledge going forward to make sure that the codebase doesn't become fork-unsafe in harmful ways.


So long as you have something like nginx in front of your server. Otherwise your whole site can be taken down by a slowloris attack over a 33.6k modem.


That's because Unix API used to assume fork() is extremely cheap. Threads were ugly performance hack second-class citizens - still are in some ways. This was indeed true on PDP-11 (just copy a <64KB disk file!), but as address spaces grew, it became prohibitively expensive to copy page tables, so programmers turned to multithreading. At then multicore CPUs became the norm, and multithreading on multicore CPUs meant any kind of copy-on-write required TLB shootdown, making fork() even more expensive. VMS (and its clone known as Windows NT) did it right from the start - processes are just resource containers, units execution are threads and all IO is async. But being technically superior doesn't outweighs the disadvantage of being proprietary.


It's also a pretty bold scheduler benchmark to be handling tens of thousands of processes or 1:1 thread wakeups, especially the further back in time you go considering fairness issues. And then that's running at the wrong latency granularity for fast I/O completion events across that many nodes so it's going to run like a screen door on a submarine without a lot of rethinking things.

Evented I/O works out pretty well in practice for the I and D cache, especially if you can affine and allocate things as the article states, and do similar natural alignments inside the kernel (i.e. RSS/consistent hashing).


To nitpick at least as of Apache HTTPD 1.3 ages ago it wasn't forking for every request, but had a pool of already forked worker processes with each handling one connection at a time but could handle an unlimited number of connections sequentially, and it could spawn or kill worker processes depending on load.

The same model is possible in Apache httpd 2.x with the "prefork" mpm.


I don't see anything in my comment that implied _when_ the forking happened so it's not really a nit :)


I'm sceptical of the efficiency gains with sendfile; seems marginal at best, even in the late 90s when it was at the height of popularity.


> seems marginal at best

Depends on the workload.

Normally you would go read() -> write() so:

1. Disk -> page cache (DMA)

2. Kernel -> user copy (read)

3. User -> kernel copy (write)

4. Kernel -> NIC (DMA)

sendfile():

1. Disk -> page cache (DMA)

No user space copies, kernel wires those pages straight to the socket

2. Kernel -> NIC (DMA)

So basically, it eliminates 1-2 memory copies along with the associated cache pollution and memory bandwidth overhead. If you are running high QPS web services where syscall and copy overheads dominate, for example CDNs/static file serving the gains can be really big. Based on my observations this can mean double digit reductions in CPU usage and up to ~2x higher throughput.


I understand the optimisation, I'm just saying I'm sceptical the optimisation is even that useful, like it seems it'd only kick in with pathological cases where kernel round trip time is really dominating; my gut reckons most applications just don't benefit. Caddy in the last few years got sendfile support and with it on and off and it usually you wouldn't see a discernible difference [1].

Which makes me sceptical for the argument for kTLS which is stated in the article; what benefit does offloading your crypto to the kernel provider (possibly making it more brittle). I've seen the author of haproxy say that performance he's seen has been only marginal, but did point out it was useful in that you can strace your process and see plaintext instead of ciphertext which is nice.

[1]: https://blog.tjll.net/reverse-proxy-hot-dog-eating-contest-c...


Then you don't understand the memory and protection model of a modern system very well.

sendfile effectively turns your user space file server into a control plane, and moves the data plane to where the data is eliminating copies between address spaces. This can be made congruent with I/O completions (i.e. Ethernet+IP and block) and made asynchronous so the entire thing is pumping data between completion events. Watch the Netflix video the author links in the post.

There is an inverted approach where you move all this into a single user address space, i.e. DPDK, but it's the same overall concept just a different who.


So there's this thing called "Setup Scripts" but they don't explicitly say these are equivalent to AWS Metadata and configured inside of Codex web interface - not a setup.sh or a package.json preinstall declaration. I wasted several hours (and lots of compute where Codex was as confused as I was) trying to figure out how to convince codex to pnpm install.


Codex engineer here. Can you elaborate on what was confusing? I would love to make it more clear.


I didn't realize the setup script had to be done in the UI over in the environment tab. I assumed it would be reading something like setup.sh from the codebase.

Docs could make that more clear


I just found this thread while trying to figure out where to set the setup script. The docs should probably make that more clear.


Counter-point A: AI coding assistance tools are rapidly advancing at a clip that is inarguably faster than humans.

Counter-point B: AI does not get tired, does not need space, does not need catering to their experience. AI is fine being interrupted and redirected. AI is fine spending two days on something that gets overwritten and thrown away (no morale loss).


Counter-counter-point A: If I work with a human Junior and they make an error or I familiarize them with any quirk of our workflow, and I correct them, they will recall that correction moving forward. An AI assistant either will not remember 5 minutes later (in a different prompt on a related project) and repeat the mistake, or I'll have to take the extra time to code some reminder into the system prompt for every project moving forward.

Advancements in general AI knowledge over time will not correlate to improvements in remembering any matters as colloquial as this.

Counter-counter-point B: AI absolutely needs catering to their experience. Prompter must always learn how to phrase things so that the AI will understand them, adjust things when they get stuck in loops by removing confusing elements from the prompt, etc.


I find myself thinking about juniors vs AI as babies vs cats. A cat is more capable sooner, you can trust it when you leave the house for two hours, but it'll never grow past shitting in a box and needing to be fed.


> If I work with a human Junior and they make an error or I familiarize them with any quirk of our workflow, and I correct them, they will recall that correction moving forward

I really wish that were the case. Most of the Jr Engineers I work with have to be told the same thing multiple times, in different ways, for things to stick.


most of the coding agents now encourage you to make a rule for those times so it does remember.


There's going to be a limit though. Plus you have to instruct them correctly.

B. Yea, that's true. I used to have over 4,000 GitHub contributions a year and it dropped to 1,000 as I got older and managed people. I used to be able to work 48 hrs straight but can't as much anymore...but you still have to be there to instruct the AI agent. It can't do it all on its own.


I'm pro-accessibility and have contributed privately to blind developer initiatives. Unfortunately Ubisoft insists on implement user-hostile accessibility that screams at the user using voice-to-text when they open their games and is quite difficult to get through even as an abled user.

How about Ubisoft work with Sony/Microsoft/Valve and get vision and hearing disability implemented at the device level rather than harassing abled users every new game which I'm sure through this frustration is contributing in some small way to these anti-intellectual movements against accessibility.


From day one. We would have had LLMs years before if Google wasn't holding back. They knew the risk - google search would be dead as soon as the internet were flooded with AI content that google could not distinguish from real content.

Then you could look at how the first "public preview" models they released were so neutered by their own inhibitions they were useless (to me). Things like over-active refusals in response to "killing child processes".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: