Hacker Newsnew | past | comments | ask | show | jobs | submit | BoredomIsFun's commentslogin

> An LLM is a router and completely stateless aside from the context you feed into it.

Not the latest SSM and hybrid attention ones.


Stateless router to router with lossy scratchpad is a step up, still not going to ask it to check my Lisp. That's what linters are for

good old illustrtation: https://www.ml6.eu/en/blog/large-language-models-to-fine-tun...

The it- one is the yellow smiling dot, the pt- is the rightmost monster head.


> If I offend anyone I will not be apologising for it.

What you said is simply counterfactual, so no reason to be offended.


Asimov is a widespread lastname in ex-USSR, esp. Central Asia. I personally know three unrelated Asimovs.

> Local model enthusiasts often assume that running locally is more energy efficient than running in a data center,

It is a well known 101 truism in /r/Localllama that local is rarely cheaper, unless run batched - then it is massively, 10x cheaper indeed.

> I think they mean that the DeepSeek API charges are less than it would cost for the electricity to run a local model.

Because it is hosted in China, where energy is cheap. In ex-USSR where I live it is inexpensive too, and keeping in mind that whole winter I had to use small space heater, due to inadequacy of my central heating, using local came out as 100% free.


Hmm...no. These two things are orthogonal. Regardless, Olmo are opensource.

1. What makes you think it is written by an LLM

2. Where is that rule, could you cite it?

3. How dow I know you did not use LLM for your comment?


1. Word choice, phrasing, and sentence structure make it seem likely. Ironically, one has to go on vibes. One gets a feel for the voice and tone used by LLMs after a while. It's also a new account with one comment.

2. "Don't post generated comments or AI-edited comments. HN is for conversation between humans." From https://news.ycombinator.com/newsguidelines.html

3. You don't.


1 and 3 contradict each other. Last thing people need is anti-AI hysteria.

> the API-driven $trillion labs?

here we go: https://huggingface.co/collections/trillionlabs/tri-series


please post it on /r/localllama


Phi-4-14b with layers duplicated (phi-4-25b) has increassed performance. Phi-4-49b has degraded vs 14b.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: