Hacker Newsnew | past | comments | ask | show | jobs | submit | imiric's commentslogin

I agree with you on the policy being balanced.

However:

> AI generated code does not substitute human thinking, testing, and clean up/rewrite.

Isn't that the end goal of these tools and companies producing them?

According to the marketing[1], the tools are already "smarter than people in many ways". If that is the case, what are these "ways", and why should we trust a human to do a better job at them? If these "ways" keep expanding, which most proponents of this technology believe will happen, then the end state is that the tools are smarter than people at everything, and we shouldn't trust humans to do anything.

Now, clearly, we're not there yet, but where the line is drawn today is extremely fuzzy, and mostly based on opinion. The wildly different narratives around this tech certainly don't help.

[1]: https://blog.samaltman.com/the-gentle-singularity


Intern generated code does not substitute for tech lead thinking, testing, and clean up/rewrite.

Your GitHub profile is... disturbing. 1,354 commits and 464 pull requests in January so far.

Regardless of how productive those numbers may seem, that amount of code being published so quickly is concerning, to say the least. It couldn't have possibly been reviewed by a human or properly tested.

If this is the future of software development, society is cooked.


It's mostly trying out my orchestration system (https://github.com/mohsen1/claude-code-orchestrator and https://github.com/mohsen1/claude-orchestrator-action) in a repo using GH_PAT.

Stuff like this: https://github.com/mohsen1/claude-code-orchestrator-e2e-test...

Yes, the idea is to really, fully automate software engineering. I don't know if I am going to be successful but I'm on vacation and having fun!

if Opus 4.5/GLM 4.7 can do so much already, I can only imagine what can be done in two years. Might as well adopt to this reality and learn how leverage this advancement


You may not like it but this is what a 10x developer looks like. :-)

you may enjoy spaghetti, but will you enjoy 10x spaghetti?

This is a common myth. Debugging unikernels is indeed possible[1][2]. It may not be the type of debugging you're already used to, but then again, unikernels are very different from containers and VMs, so some adjustment is expected.

As for observability, why is that the concern of unikernels? That's something your application should do. You're free to hook it up to any observability stack you want.

[1]: https://nanovms.com/dev/tutorials/debugging-nanos-unikernels...

[2]: https://unikraft.org/docs/internals/debugging


Respectfully, neither of these docs strike me as really sufficient to debug live running systems in the critical path for paying users. The first seems to be related to the inner development loop and local the second is again how to attach gdb to debug something in a controlled environment

Crash reporting, telemetry, useful queuing/saturation measures or a Rosetta Stone of “we look at X today in system and app level telemetry, in the <unikernel system> world we look at Y (or don’t need X for reason Z) would be more in the spirit of parity

Systems are often somewhat “hands off” in more change control sensitive environments too, these guides presume full access, line of sight connectivity and a expert operator which are three unsafe assumptions in larger production systems IMO


> By the time you get to day two, each turn costs tens of thousands of input tokens

This behavior surprised me when I started using LLMs, since it's so counterintuitive.

Why does every interaction require submitting and processing all data in the current session up until that point? Surely there must be a way for the context to be stored server-side, and referenced and augmented by each subsequent interaction. Could this data be compressed in a way to keep the most important bits, and garbage collect everything else? Could there be different compression techniques depending on the type of conversation? Similar to the domain-specific memories and episodic memory mentioned in the article. Could "snapshots" be supported, so that the user can explore branching paths in the session history? Some of this is possible by manually managing context, but it's too cumbersome.

Why are all these relatively simple engineering problems still unsolved?


It's not unsolved, at least not the first part of your question. In fact it is a feature offered by all main LLM providers!

- https://platform.openai.com/docs/guides/prompt-caching

- https://platform.claude.com/docs/en/build-with-claude/prompt...

- https://ai.google.dev/gemini-api/docs/caching


Ah, that's good to know, thanks.

But then why is there compounding token usage in the article's trivial solution? Is it just a matter of using the cache correctly?


Cached tokens are cheaper (90% discount ish) but not free

Also, unlike OpenAI, Anthropic's prompt caching is explicit (you set up to 4 cache "breakpoints"), meaning if you don't implement caching then you don't benefit from it.

thats a very generous way of putting it. Anthropic's prompt caching is actively hostile and very difficult to implement properly.

dumb question, but is prompt caching available to Claude Code … ?

If you're using the API, yes. If you have a subscription, you don't care, as you aren't billed per prompt (you just have a limit).

This is great, but I don't see it being useful for most use cases.

Most high-level charting libraries already support downsampling. Rendering data that is not visible is a waste of CPU cycles anyway. This type of optimization is very common in 3D game engines.

Also, modern CPUs can handle rendering of even complex 2D graphs quite well. The insanely complex frontend stacks and libraries, a gazillion ads and trackers, etc., are a much larger overhead than rendering some interactive charts in a canvas.

I can see GPU rendering being useful for applications where real-time updates are critical, and you're showing dozens of them on screen at once, in e.g. live trading. But then again, such applications won't rely on browsers and web tech anyway.


"Tell me the reasons why this report is stupid" is a loaded prompt. The tool will generate whatever output pattern matches it, including hallucinating it. You can get wildly different output if you prompt it "Tell me the reasons why this report is great".

It's the same as if you searched the web for a specific conclusion. You will get matches for it regardless of how insane it is, leading you to believe it is correct. LLMs take this to another level, since they can generate patterns not previously found in their training data, and the output seems credible on the surface.

Trusting the output of an LLM to determine the veracity of a piece of text is a baffilingly bad idea.


>"Tell me the reasons why this report is stupid" is a loaded prompt.

This is precisely the point. The LLM has to overcome its agreeableness to reject the implied premise that the report is stupid. It does do this but it takes a lot, but it will eventually tell you "no actually this report is pretty good"

The point being filtering out slop, we can be perfectly find with false rejections.

The process would look like "look at all the reports, generate a list of why each of them is stupid, and then give me a list of the ten most worthy of human attention" and it would do it and do a half-decent job at it. It could also pre-populate judgments to make the reviewer's life easier so they could very quickly glance at it to decide if it's worthy of a deeper look.


The survey was about operational costs and revenue. Water cooler and coffee machine manufacturers don't market their products to be "smarter than people in many ways" and "able to significantly amplify the output of people using them"[1]. If these claims are true, then surely relying on this technology should bring both lower operational costs, since human labor is expensive, and an increase in revenue, since the superhuman intelligence and significantly amplified output of humans using these tools should produce higher quality products and benefits across the board.

There are of course many factors at play here, and a substantial percentage of CEOs report a positive RoI, but the fact that a majority don't shouldn't be dismissed on the basis of this being difficult to measure.

[1]: https://blog.samaltman.com/the-gentle-singularity


Why the doom and gloom? It's going to hurt those who jumped on the bandwagon hoping to cash out. I couldn't care less about them, but I would like for computer hardware to be affordable again, and for the tech job market to go back to normal.

Though I'm more concerned about the effects of the current political climate, than the "AI" bubble popping. In the scenario of that going south, nothing will be normal for a long time.


These things are never self contained, if the subprime crisis only affected predatory bankers nobody would have cared....

True, but a better comparison is the dot-com crash. The effects of that were mainly contained to the tech industry and the stock market. People who weren't invested in either barely noticed the crash.

This time around the ramifications might be larger, but it will still mostly be felt by those inside the bubble.

Personally, I would rather experience a slight discomfort from the crash followed by a normalization of hardware prices and the job market, than continue bearing the current insanity.


> better comparison is the dot-com crash. The effects of that were mainly contained to the tech industry and the stock market. People who weren't invested in either barely noticed the crash.

That's because the companies were largely public. In the dotcom era, the exit for startups was to IPO, as early as possible. When someone like pets.com was burning through cash with zero path to profitability, it was public knowledge, everyone could see it.

The AI built out is largely funded with private credit. It's a black box, we have no idea of true valuations. The mean time for a company to go public went from 4 years during dotcom to 14 years now. The rot can be hidden for a long time until the big funds go bust, with all the collateral damage that brings.


The C-levels who jumped on the bandwagon are definitely not going to fall on their swords should it go south. They’ll blame the tech, fire some subordinates, suggest their customers for “not understanding it”, and their shareholders will eat it up as long as they get a pound of flesh.

> suggest their customers for “not understanding it”

See Microsoft's recent "We don't understand how you all are not impressed by AI."

In the case of MS, you're right, Satya isn't going to fall on his own sword. They will just continue to bundle and raise prices, make it impossible to buy anything else (because you still need the other tools) and then pitch that to shareholders as success "Look how many people buy Copilot (even though it's forcefully bundled into every offering we sell("


They can't really stop swords falling on them though...

They’ll just move on to the next company that will hire them with a golden parachute.

There’s minimal risk to the decision makers. Meanwhile, every one of us peons is significantly more at risk of losing our jobs whether we could be effectively replaced with these AI tools or not because our own C-level execs decided to drink the snake oil that is the current bubble.


Everyone is a decision maker. Don't let your perceived lack of impact discourage you. For instance, I help my community by feeding them. It's small, but powerful.

Umm no, your example is a backwards reading of the data.

From the PwC survey:

> More than half (56%) say their company has seen neither higher revenues nor lower costs from AI, while only one in eight (12%) report both of these positive impacts.

So The Register article title is correct.

> It's a snowball effect that eventually builds bigger and bigger.

That's just wishful thinking based on zero evidence.


How exactly is it a backwards reading of the data? Or are you trying to insulting me?

I never said the title is incorrect so I'm not sure what you're trying to prove.

It's not wishful thinking. I actually looked at the trends instead of a single data point to reinforce my already decided conclusion. I also read the article and followed the percentages instead of assuming the values were all absolute. This is what I mean by reading the data backwards.


Considering that "AI" providers are now adopting advertising, I wonder how many of them are actually seeing lower costs and higher revenue from dogfooding.

The hype train must go on, and I'm sure all employees are under strict NDAs, so we may never know.


Adopting advertising in almost an inevitably in any tech at this point. I wouldn’t necessarily attribute it to anything they’re seeing in usage - IMO its just the standard “we’re leaving money on the table by not” that we’ve seen time and time again.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: