Holy cow their chatapp demo!!! I for first time thought i mistakenly pasted the ...

qingcharles · 2026-02-20T15:13:28 1771600408

I asked it to design a submarine for my cat and literally the instant my finger touched return the answer was there. And that is factoring in the round-trip time for the data too. Crazy.

The answer wasn't dumb like others are getting. It was pretty comprehensive and useful.

  While the idea of a feline submarine is adorable, please be aware that building a real submarine requires significant expertise, specialized equipment, and resources.

robotpepi · 2026-02-20T17:45:44 1771609544

it's incredible how many people are commenting here without having read the article. they completely lost the point.

smusamashah · 2026-02-20T13:56:07 1771595767

With this speed, you can keep looping and generating code until it passes all tests. If you have tests.

Generate lots of solutions and mix and match. This allows a new way to look at LLMs.

Retr0id · 2026-02-20T14:17:44 1771597064

Not just looping, you could do a parallel graph search of the solution-space until you hit one that works.

xi_studio · 2026-02-20T14:49:00 1771598940

Infinite Monkey Theory just reached its peak

dave1010uk · 2026-02-20T19:06:02 1771614362

You could also parse prompts into an AST, run inference, run evals, then optimise the prompts with something like a genetic algorithm.

turnsout · 2026-02-20T16:31:56 1771605116

Agreed, this is exciting, and has me thinking about completely different orchestrator patterns. You could begin to approach the solution space much more like a traditional optimization strategy such as CMA-ES. Rather than expect the first answer to be correct, you diverge wildly before converging.

Epskampie · 2026-02-20T14:25:08 1771597508

And then it's slow again to finally find a correct answer...

34679 · 2026-02-20T16:48:05 1771606085

It won't find the correct answer. Garbage in, garbage out.

akie · 2026-02-21T11:58:42 1771675122

How about if you run this loop (one year from now) on this kind of hardware but with something like Claude/Kimi K2. How about that? Because that's where it'll go.

MattRix · 2026-02-20T14:00:52 1771596052

This is what people already do with “ralph” loops using the top coding models. It’s slow relative to this, but still very fast compared to hand-coding.

otabdeveloper4 · 2026-02-20T19:57:26 1771617446

This doesn't work. The model outputs the most probable tokens. Running it again and asking for less probable tokens just results in the same but with more errors.

therealdrag0 · 2026-02-20T22:11:30 1771625490

Do you not have experience with agents solving problems? They already successfully do this. They try different things until they get a solution.

amelius · 2026-02-20T12:40:29 1771591229

OK investors, time to pull out of OpenAI and move all your money to ChatJimmy.

freakynit · 2026-02-20T12:43:44 1771591424

A related argument I raised a few days back on HN:

What's the moat with with these giant data-centers that are being built with 100's of billions of dollars on nvidia chips?

If such chips can be built so easily, and offer this insane level of performance at 10x efficiency, then one thing is 100% sure: more such startups are coming... and with that, an entire new ecosystem.

codebje · 2026-02-20T13:00:06 1771592406

RAM hoarding is, AFAICT, the moat.

freakynit · 2026-02-20T13:10:09 1771593009

lol... true that for now though

Windchaser · 2026-02-20T16:44:29 1771605869

Yeah, just cause Cisco had a huge market lead on telecom in the late '90s, it doesn't mean they kept it.

(And people nowadays: "Who's Cisco?")

wmf · 2026-02-21T01:20:32 1771636832

They did mostly keep it though.

Windchaser · 2026-02-24T18:39:30 1771958370

Sure, but it's taken their stock price about 20 years to recover.

bee_rider · 2026-02-20T13:58:08 1771595888

I think their hope is that they’ll have the “brand name” and expertise to have a good head start when real inference hardware comes out. It does seem very strange, though, to have all these massive infrastructure investment on what is ultimately going to be useless prototyping hardware.

elictronic · 2026-02-20T15:07:12 1771600032

Tools like openclaw start making the models a commodity.

I need some smarts to route my question to the correct model. I wont care which that is. Selling commodities is notorious for slow and steady growth.

jzymbaluk · 2026-02-20T16:48:57 1771606137

You'd still need those giant data centers for training new frontier models. These Taalas chips, if they work, seem to do the job of inference well, but training will still require general purpose GPU compute

amelius · 2026-02-21T11:59:26 1771675166

Yeah but you need even bigger factories to fabricate those inference chips, so what is the point?

bonoboTP · 2026-02-20T21:01:03 1771621263

Next up: wire up a specialized chip to run the training loop of a specific architecture.

mlboss · 2026-02-20T17:50:05 1771609805

If I am not mistaken this chip was build specifically for the llama 8b model. Nvidia chips are general purpose.

wmf · 2026-02-20T19:46:32 1771616792

Nvidia bought all the capacity so their competitors can't be manufactured at scale.

raincole · 2026-02-20T13:12:09 1771593129

You mean Nvidia?

rstuart4133 · 2026-02-21T05:27:23 1771651643

> It was literally in a blink of an eye.!!

It's not even close. It takes the eye 100mm .. 400ms to blink. This think takes under 30ms to process a small query - about 10 times faster.

zwaps · 2026-02-20T12:34:20 1771590860

I got 16.000 tokens per second ahaha

gwd · 2026-02-20T12:37:34 1771591054

I dunno, it pretty quickly got stuck; the "attach file" didn't seem to work, and when I asked "can you see the attachment" it replied to my first message rather than my question.

scosman · 2026-02-20T12:54:52 1771592092

It’s llama 3.1 8B. No vision, not smart. It’s just a technical demo.

anthonypasq · 2026-02-20T15:23:53 1771601033

why is everyone seemingly incapable of understanding this? waht is going on here? Its like ai doomers consistently have the foresight of a rat. yeah no shit it sucks its running llama 3 8b, but theyre completely incapable of extrapolation.

freakynit · 2026-02-20T12:41:35 1771591295

Hmm.. I had tried simple chat converation without file attachments.

PlatoIsADisease · 2026-02-20T14:25:35 1771597535

Well it got all 10 incorrect when I asked for top 10 catchphrases from a character in Plato's books. It confused the baddie for Socrates.

Rudybega · 2026-02-21T02:32:58 1771641178

Well yeah, they're running a small, outdated, older model. That's not really the point. This approach can be used for better, larger, newer models.

bsenftner · 2026-02-20T12:36:41 1771591001

I get nothing, no replies to anything.

freakynit · 2026-02-20T12:40:55 1771591255

Maybe hn and reddit crowd have overloaded them lol

elliotbnvl · 2026-02-20T12:27:45 1771590465

That… what…

b0ner_t0ner · 2026-02-20T13:28:34 1771594114

I asked, “What are the newest restaurants in New York City?”

Jimmy replied with, “2022 and 2023 openings:”

0_0

freakynit · 2026-02-20T13:38:50 1771594730

Well, technically it's answer is correct when you consider it's knowledge cutoff date... it just gave you a generic always right answer :)

xi_studio · 2026-02-20T14:52:54 1771599174

chatjimmy's trained on LLama 3.1

jvidalv · 2026-02-20T13:36:32 1771594592

Is super fast but also super inaccurate, I would say not even gpt-3 levels.

roywiggins · 2026-02-20T18:05:39 1771610739

That's because it's llama3 8b.

empath75 · 2026-02-20T13:59:46 1771595986

There are a lot of people here that are completely missing the point. What is it called where you look at a point of time and judge an idea without seemingly being able to imagine 5 seconds into the future.

Alifatisk · 2026-02-20T17:06:05 1771607165

“static evaluation”

Etheryte · 2026-02-20T12:44:24 1771591464

It is incredibly fast, on that I agree, but even simple queries I tried got very inaccurate answers. Which makes sense, it's essentially a trade off of how much time you give it to "think", but if it's fast to the point where it has no accuracy, I'm not sure I see the appeal.

andrewdea · 2026-02-20T13:27:56 1771594076

the hardwired model is Llama 3.1 8B, which is a lightweight model from two years ago. Unlike other models, it doesn't use "reasoning:" the time between question and answer is spent predicting the next tokens. It doesn't run faster because it uses less time to "think," It runs faster because its weights are hardwired into the chip rather than loaded from memory. A larger model running on a larger hardwired chip would run about as fast and get far more accurate results. That's what this proof of concept shows

Etheryte · 2026-02-20T13:52:20 1771595540

I see, that's very cool, that's the context I was missing, thanks a lot for explaining.

Sabinus · 2026-02-21T02:46:43 1771642003

I don't mean to be rude, but did you read the article before commenting?

Etheryte · 2026-02-22T10:25:10 1771755910

I'm commenting on the link to their demo, not on the article.

kaashif · 2026-02-20T12:46:02 1771591562

If it's incredibly fast at a 2022 state of the art level of accuracy, then surely it's only a matter of time until it's incredibly fast at a 2026 level of accuracy.

PrimaryExplorer · 2026-02-20T12:51:33 1771591893

yeah this is mindblowing speed. imagine this with opus 4.6 or gpt 5.2. probably coming soon

scotty79 · 2026-02-20T13:16:13 1771593373

I'd be happy if they can run GLM 5 like that. It's amazing at coding.

Gud · 2026-02-20T12:53:51 1771592031

Why do you assume this?

I can produce total jibberish even faster, doesn’t mean I produce Einstein level thought if I slow down

Closi · 2026-02-20T17:54:42 1771610082

Better models already exist, this is just proving you can dramatically increase inference speeds / reduce inference costs.

It isn't about model capability - it's about inference hardware. Same smarts, faster.

andy12_ · 2026-02-20T13:33:24 1771594404

Not what he said.

scotty79 · 2026-02-20T13:15:34 1771593334

I think it might be pretty good for translation. Especially when fed with small chunks of the content at a time so it doesn't lose track on longer texts.

rvz · 2026-02-20T14:00:09 1771596009

Fast, but stupid.

   Me: "How many r's in strawberry?"

   Jimmy: There are 2 r's in "strawberry".

   Generated in 0.001s • 17,825 tok/s

The question is not about how fast it is. The real question(s) are:

   1. How is this worth it over diffusion LLMs (No mention of diffusion LLMs at all in this thread)

(This also assumes that diffusion LLMs will get faster)

   2. Will Talaas also work with reasoning models, especially those that are beyond 100B parameters and with the output being correct? 

   3. How long will it take to create newer models to be turned into silicon? (This industry moves faster than Talaas.)

   4. How does this work when one needs to fine-tune the model, but still benefit from the speed advantages?

mike_hearn · 2026-02-20T17:59:15 1771610355

The blog answers all those questions. It says they're working on fabbing a reasoning model this summer. It also says how long they think they need to fab new models, and that the chips support LoRAs and tweaking context window size.

I don't get these posts about ChatJimmy's intelligence. It's a heavily quantized Llama 3, using a custom quantization scheme because that was state of the art when they started. They claim they can update quickly (so I wonder why they didn't wait a few more months tbh and fab a newer model). Llama 3 wasn't very smart but so what, a lot of LLM use cases don't need smart, they need fast and cheap.

Also apparently they can run DeepSeek R1 also, and they have benchmarks for that. New models only require a couple of new masks so they're flexible.

fennecbutt · 2026-02-22T02:11:33 1771726293

The counting rs in strawberry problem was a example of people not understanding how the models work but I guess good to show the limitations of the current architectures.

But thing is, those architectures haven't improved a whole lot. Now when it answers that correctly it's either in training data or by virtue of "count letters" or code sandbox tools.

simlevesque · 2026-02-20T17:14:13 1771607653

LLMs can't count. They need tool use to answer these questions accurately.

CamperBob2 · 2026-02-21T06:17:30 1771654650

That particular one can't count without using external tools. Others can, and do.