Hacker Newsnew | past | comments | ask | show | jobs | submit | jjmarr's commentslogin

I've spent US$16 700 last month. I made an autoscaling K8s cluster for distributed compilation/caching on a large C++ project. I also heavily modified the build system to use a forked version of `siso` compatible with our environment.

That meant we can go from 17 minutes on 32 cores to 5 minutes on a few hundred. And because it's distributed compilation we don't have to provision each developer with an overpowered build system they won't be using most of the time.

It could also eliminate our CI backlog because autoscaling. Over a few hundred engineers building this codebase this probably a few thousand hours of waiting a week.

This took me about 2 weeks as someone who graduated 9 months ago. Most of the tokens were spent in several hour long debugging sessions relating to distributed systems networking and tracing through gRPC logs because the system wasn't working until it did.

I think I'd need several years of experience and 6 months as a full time engineer to have accomplished the same thing pre-AI.

Since I work at a semiconductor company near Toronto there's nobody around with the distributed systems experience to mentor me. I did it mostly on my own as a side project because I read a blog post. I literally wouldn't have been able to complete this without AI.

I'm sure the actual solution is terrible compared to what a senior developer with experience would've created. But my company feels like it's getting ROI on the token spend so far even though it's double my salary.


> No one is going to not buy something because it is kosher. But if paying a thousand dollars a year to put a small circle-u symbol on the back of your box can increase sales by 1% among observant Jews, most companies are going to do it.

Contrary to perceived politics, many Muslims will eat kosher food because it's a superset of halal rules (excl. alcohol).

It's a globally consolidated certification through organizations like the Orthodox Union. This is unlike halal which is local and has many scammers offering to pencil whip compliance. This means many Muslims will prefer kosher to "halal" food to avoid due diligence on the certification agency.

To tie this into age-verification, companies and ecosystems will use the strictest method that makes them globally compliant. Consumers will prefer that convenience even in the presence of intense political beliefs.

A bank that uses seamless OS-level age checks everywhere will win against one asking manually in the jurisdictions it isn't required.


I hope everyone’s bank knows how old they are— what with all the documentation we have to cough up to keep us safe from Terrorism , patriot act, 9/11, never forget, etc

If you're doing single-core builds, you will get impressive speedups from unity builds.

This is because C++ compilers spends a lot of time redundantly parsing the same headers included in different .cpp files.

Normally, you get enough gains from compiling each .cpp file in parallel that it outweighs the parsing, but if you're artificially limited in parallelism then unity builds can pay for themselves very quickly as they did in the article.

C++20 modules try to split the difference by parsing each header once into a precompiled module, allowing it to reuse work across different .cpp files.

Unfortunately, it means C++ compilation isn't embarrassingly parallel, which is why we have slow buildsystem adoption.


The problem is that due to how templates work, each compilation unit will end up with its own copy of templated function, which creates extra work, code bloat etc.

The compiler also doesn't really inline or optimize functions as well across object boundaries without link-time optimization.

But the linker is single threaded and notoriously slow - with LTO, I wouldnt be surprised it would take up as much time as the whole unity build, and the results are often suboptimal.

Also, C++ syntax is notoriously hard and slow to parse, the clang frontend takes almost as much time to run as LLVM itself.

So probably modules would help a lot with parallel parsing, but that would help unity builds just as much.


> each compilation unit will end up with its own copy of templated function, which creates extra work, code bloat etc.

Yes, that's what causes the parsing bottleneck. Unity builds don't need to create multiple copies of templated functions.

C++20 modules could fix that because the function is parsed before substitution. Tbd on if that optimization works yet, I tried it on Clang 18 and it didn't.

> But the linker is single threaded and notoriously slow

I think most linkers have parallel LTO and `mold` provides actual parallel linking.


> But I still do not understand how one can consider writing to memory the OS owns to be ok.

Your manager tells you to reduce memory usage of the program "or else".


TBH i think a more likely explanation is that they needed to somehow identify separate instances of that data structure and they thought to store some ID or something in it so that when they encountered it next they'd be able to do that without keeping copies of all the data in it and then comparing their data with the system's.

^^ The voice of experience, here.

Or you desperately need to tag some system object and the system provides no legitimate means to do so. That can be invaluable when troubleshooting things, or even just understanding how things work when the system fails to document behavior or unreasonably conceals things.

I've been there and done it, and I offer no apologies. The platform preferred and the requirements demanded by The Powers That Be were not my fault.


"Stable ABI" is a joke in C++ because you can't keep ABI and change the implementation of a templated function, which blocks improvements to the standard library.

In C, ABI = API because the declaration of a function contains the name and arguments, which is all the info needed to use it. You can swap out the definition without affecting callers.

That's why Rust allows a stable C-style ABI; the definition of a function declared in C doesn't have to be in C!

But in a C++-style templated function, the caller needs access to the definition to do template substitution. If you change the definition, you need to recompile calling code i.e. ABI breakage.

If you don't recompile calling code and link with other libraries that are using the new definition, you'll violate the one-definition rule (ODR).

This is bad because duplicate template functions are pruned at link-time for size reasons. So it's a mystery as to what definition you'll get. Your code will break in mysterious ways.

This means the C++ committee can never change the implementation of a standardized templated class or function. The only time they did was a minor optimization to std::string in 2011 and it was such a catastrophe they never did it again.

That is why Rust will not support stable ABIs for any of its features relying on generic types. It is impossible to keep the ABI stable and optimize an implementation.


C++ builds are extremely slow because they are not correct.

I'm doing a migration of a large codebase from local builds to remote execution and I constantly have bugs with mystery shared library dependencies implicitly pulled from the environment.

This is extremely tricky because if you run an executable without its shared library, you get "file not found" with no explanation. Even AI doesn't understand this error.


The dynamic linker can clearly tell you where it looks for files and in which order, and where it finds them if it does.

You can also very easily harden this if you somehow don't want to capture libraries from outside certain paths.

You can even build the compiler in such a way that every binary it produces has a built-in RPATH if you want to force certain locations.


That is what I'm doing so I can get distributed builds working. It sucks and has taken me days of work.

It's pretty simple and works reliably as specified.

I can only infer that your lack of familiarity was what made it take so long.

Rebuilding GCC with specs does take forever, and building GCC is in general quite painful, but you could also use patchelf to modify the binary after the fact (which is what a lot of build systems do).


> I can only infer that your lack of familiarity was what made it take so long

Pretty much.

Trying to convert an existing build that doesn't explicitly declare object dependencies is painful. Rust does it properly by default.

For example, I'm discovering our clang toolchain has a transitive dependency on a gcc toolchain.


Clang cannot bootstrap in the same way GCC can; you need GCC (or another clang) to build it. You can obviously build it twice to have it be built by itself (bear in mind some of the clang components already do this, because they have to be built by clang).

In general though, a clang install will still depend on libstdc++, libgcc, GCC crtbegin.o and binutils (at least on Linux), which is typically why it will refer to a specific GCC install even after being built.

There are of course ways to use clang without any GCC runtime, but that's more involved and non-standard (unless you're on Mac).

And there is also the libc dependency (and all sysroot aspects in general) and while that is usually considered completely separate from GCC, the filesystem location and how it is found is often tied to how GCC is configured.


You don't on new projects. CMake + ninja has support for modules on gcc, clang, and MSVC.

This should be your default stack on any small-to-medium sized C++ project.

Bazel, the default pick for very large codebases, also has support for C++20 modules.


I have yet to see modules in the wild. What I have seen extensively are header-only projects.

It's the fault of built systems. CMake still doesn't support `import std` officially and undocumented things are done in the ecosystem [1]

But once it works and you setup the new stuff, having started a new CPP26 Project with modules now, it's kinda awesome. I'm certainly never going back. The big compilers are also retroactively adding `import std` to CPP20, so support is widening.

[1] https://gitlab.kitware.com/cmake/cmake/-/work_items/27706


I wanted to ship import std in 4.3 but there are some major disagreements over where the std.o symbols are supposed to come from.

Clang says "we don't need them", GCC says "we'll ship them in libstdc++", and MSVC says "you are supposed to provide them".

I didn't know about that when I was working on finishing import std for CMake and accidentally broke a lot of code in the move to a native implementation of the module manifest format, so everything got reverted and put back into experimental.


That's really interesting info, thanks!

weird to blame build systems for a problem caused by the language

You are of course right. It's just that Modules inherently put a lot of responsibility on the build system. Among those, but not limited to: a "module registry" wasn't standardized and is in the hands of the build system.

Systems like ninja needs to know modules, which took time and then a stack further up systems like CMake needed to know modules, which took time. That's my answer to the parent "why are there so few modules projects". Because it took time for the ecosystem to catch up.


You're not supposed to distribute the precompiled module file. You are supposed to distribute the source code of the module.

Header-only projects are the best to convert to modules because you can put the implementation of a module in a "private module fragment" in that same file and make it invisible to users.

That prevents the compile-time bloat many header-only dependencies add. It also avoids distributing a `.cpp` file that has to be compiled and linked separately, which is why so many projects are header-only.


What I mean is, I have yet to see projects in the wild _use modules at all_.

Plenty of examples on Github, Microsoft has talks on how Office has migrated to modules, and the Vulkan updated tutorials from Khronos, have an optional learning path with modules.

Modules need a lot of tooling. The tool vendors have been working hard on this for years. They have only just now said this is ready for early adopters. Most people are waiting for the early adopters to write the books on what best practices are - this needs a few more years of experience.

if something so simple needs years of experience it's poorly designed

Modules are not simple. They sound simple only to people who have never digged into them.

I've worked extensively on module/import semantics for multiple products in my life. It is complex. However this complexity is on the implementer and not the user.

If "best practices" need to be refined over years, it is poorly designed. This is not untrodden ground, other languages and ecosystems do sane things.


This was considered during standardization. The feeling among tool developers at the time was it was "close enough" to Fortran modules to be mostly solvable.

This was wrong, mostly because C++ compiler flag semantics are far more complicated than in Fortran, you live and you learn. The bones of most implementations is identical to Fortran though, we got a ~3 year head start on the work because of that.

Ninja already had the dyndep patch ready to go from Fortran, CMake knew basically how to use scanners in build steps. However, it took longer than expected to get scanner support into the compilers, which then delayed everything downstream. Understanding when BMIs need to be rebuilt is still tricky. Packaging formats needed to be updated to understand module maps, etc, etc.

Each step took a little longer than was initially hoped, and delays snowballed a bit. We'll get there.


Thanks. It's been a long time since I started a C++ project, and I've never set up any build chain in Visual Studio or Xcode other than the default.

How about using Zig to build C++ projects?

I haven't used it.

That being said, while it looks better than CMake, for anything professional I need remote execution support to deviate from the industry standard. Zig doesn't have that.

This is because large C++ projects reach a point where they cannot be compiled locally if they use the full language. e.g. Multi-hour Chromium builds.


Surely Zig can also be invoked using any CI/CD flow running on a remote machine too.

I'm referring to this:

https://github.com/bazelbuild/remote-apis

Once you get a very large C++ project with several thousand compilation jobs over hundreds of devs, you need to distribute the build across multiple computers and have a shared cache for object files.

Zig doesn't seem to support that.


No, because most major compilers don't support header units, much less standard library header units from C++26.

What'll spur adoption is cmake adopting Clang's two-step compilation model that increases performance.

At that point every project will migrate overnight for the huge build time impact since it'll avoid redundant preprocessing. Right now, the loss of parallelism ruins adoption too much.


It might be because there's a person in the photo, and France is very strict on photographing people.

https://commons.wikimedia.org/wiki/Commons:Country_specific_...

In terms of the formatting/brevity, Reuters was originally a wire service. They'd cover news in foreign locations and send it by telegraphic wire to local newspapers that would license the content.

Telegraphs charged by the word and didn't have letter case. Cryptic in-band signals like "NO USE FRANCE" are a relic of that time.

Since the link OP posted is to the B2B part of Reuters, I'm assuming they still haven't modernized this system.


It doesn't seem to be about photographing people, other pictures don't feature people and still have the "NO USE FRANCE" tag. It seems like all pictures by Chris Jung have the "NO USE FRANCE" tag.

My best guess is that Chris Jung has some kind of an exclusivity contract for publishing in France. Looking at his website, he publishes in "Paris Match", a French magazine, so it may be related.


That makes more sense.

You folks are amazing, thanks for catching that. My curiosity is soothed!

This is the traditional "innovators dillema" where a skilled profession facing an imperfect technological threat decides not to adopt it until it is too late.

AI generated articles are, on the balance, inferior, except for people that want simple, low quality content.

But LLMs are moving up the value chain with Deep Research. They can give explanations tuned to a reader's knowledge/viewpoints and provide interactive content Wikipedia doesn't support. That is a killer app for math/science topics.

Wikipedia will win against a generic corporate encyclopedia on neutrality/oversight, but it'll lose badly on UX, which is what matters.

I think the tipping point will be direct integration of academic sources into ChatGPT/Claude/Gemini and a "WikiLink" type way to discover interesting follow-up topics.

I can't trust AI answers for serious historical or social science topics because of the first. And generally my chat with AI ends once I get the answer I need because I can't get rabbitholed into other topics.


It REALLY depends on how you're using the AI. I get the strong impression a lot of people are still at the "I'll write a few prompts and see what happens" stage, and hoping for an answer from the magical oracle; as opposed to really using the tool. This never fails to disappoint.

I might be slightly wrong, but probably not by a lot, yet. Sure there's an element of "holding-it-wrong-ism" in my position. But ... it does actually take practice to get it right, and best practices are badly documented!

That said the situation is changing rapidly: https://news.ycombinator.com/item?id=47547849 "AI bug reports went from junk to legit overnight, says Linux kernel czar"

--


Most Wikipedia work is taking paywalled academic content and summarizing it in an encyclopedic format.

For programming, agentic AI can find most of what it needs because everything is open access on Arxiv, blogs, or in the codebase itself. That's why it can "magical oracle" answer questions that were limited to good prompting.

For most other professional topics, citations are locked behind paywalls. Wikipedia editors get free access to academic libraries, but the readers don't. That's why consumer tools suck.

When the big AI companies integrate with proprietary databases in fields like history or social sciences is the time when Wikipedia dies for answering questions.


it’s not supposed to win on UX, it’s current UX is maybe too conservative sure

of course they banned ai they could barely allow css


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: