More

halflings · 2026-04-02T11:18:54 1775128734

> Youtube charges $10 per month and doesn't produce a single video

It is different from Netflix (that pays upfront for production costs), but there's of course a revenue share + the bulk of the revenue for creators is actually from sponsorships (which YT doesn't take a share of).

halflings · 2026-03-29T20:49:55 1774817395

LLMs generate their output one token at a time. The first thought when you learn this is that this is a huge performance bottleneck, as we are used to highly parallelized systems.

However, a large part of what makes LLMs feel so magical comes from this bottleneck.

halflings · 2026-02-02T20:50:38 1770065438

The main thing I noticed in the video is that they have heavily sped up all the code generation sections... seems to be on 5x speed or more. (because people got used to how fast and good Sonnet, and especially Gemini 3.0 Flash, are)

halflings · 2026-02-02T20:48:04 1770065284

Deploying from Antigravity is as easy as say connecting the Firebase MCP [1] and asking it "deploy my app to firebase".

[1] https://firebase.google.com/docs/ai-assistance/mcp-server

halflings · 2025-12-25T18:39:02 1766687942

+1, reading through the post, the PR updating the documentation... thanks for being transparent, but also don't be so hard on yourself!

That was a very niche error, that you promptly corrected, no need to be so apologetic about it! And thanks for all the hard work making Python faster!

halflings · 2025-12-11T10:14:23 1765448063

"The models perform differently when called via the API vs in the Gemini UI."

This shouldn't be surprised, e.g. the model != the product. The same way GPT4o behaves differently than the ChatGPT product when using GPT4o.

halflings · 2025-05-11T12:01:57 1746964917

I would also add that search has already moved elsewhere.

Less and less people are using search engines to shop, ex:Amazon makes >$57B a year from search ads, but also look at Temu and Shein which are mostly glorified product search platforms.

No one is searching for "funny videos" when you can just open Instagram and Tiktok.

The only real unique thing that search engines can do is queries that are not directly commercial (e.g. education, information seeking, etc.) and competition is insanely intense (w/ ChatGPT, Perplexity, etc) there.

aydyn · 2025-05-11T18:37:20 1746988640

Thats not true, there are some search categories that currently only google gets right.

Aside, ChatGPT is horrendous at filtering web search.

fluidcruft · 2025-05-11T12:11:09 1746965469

Honestly, I haven't used either of ChatGPT or Perplexity seriously. They haven't performed particularly well when I tested them and dun-dun-dun in my uses Gemini has been growing on me. And another odd thing at the moment is that Google's search has somehow become better at giving me the results I'm looking for and DDG is giving a lot of annoying crap.

yegg · 2025-05-11T12:38:38 1746967118

Any particular results/examples we can look into? Feel free to email me if preferred (see profile).

fluidcruft · 2025-05-11T12:48:51 1746967731

Basically when I search for API of a specific function or package docs on DDG I end up with page after page of people blogging about using them and the actual docs don't show up. So I add "!g" and the same crap is there, but the link to reference will be somewhere among the first page of results (although Goggle usually has a link to an old stale version of the docs).

yegg · 2025-05-11T12:56:12 1746968172

Do you have specific examples of this behavior that I can look into? Also, curious if you've tried our Assist function (comes up automatically for some searches or click Assist under search box) or duck.ai for stuff like that?

halflings · 2025-04-20T13:28:23 1745155703

That's what the chart says yes. 14.1GB VRAM usage for the 27B model.

erichocean · 2025-04-20T13:34:53 1745156093

That's the VRAM required just to load the model weights.

To actually use a model, you need a context window. Realistically, you'll want a 20GB GPU or larger, depending on how many tokens you need.

oezi · 2025-04-20T14:05:01 1745157901

I didn't realize that the context would require such so much memory. Is this KV caches? It would seem like a big advantage if this memory requirement could be reduced.

halflings · on Dec 5, 2024

> I cannot be the first person to think about such possibilities

Differentiable Rendering [1] is the closest thing to what you are describing. And yes, people have been working on this for the same reason you outline, it is more data/compute efficient and hence should generalize better.

[1] https://blog.qarnot.com/article/an-overview-of-differentiabl...

But also: > While cool, this also seems utterly wasteful. Video games offer known "analytical" solutions for the interactions that the model provides as a "statistical approximation", so to say.

A bit of the same debate as people calling LLMs a "blurry JPEG of the web" and hence useless.

Yes this is a statistical approximation to an analytical problem... but that's a very reductive framing to what is going on. To find the symbolic/analytical solution here would require to constrain the problem greatly: not all things on the screen have a differentiable representation, for example complex simulations might involve some kind of custom internal loop/simulation.

You waste compute to get a solution that can just be trained on billions of unlabeled (synthetic) examples, and then generalize to previously unseen prompts/environments.

halflings · on July 23, 2024

Training code is only useful to people in academia, and the closest thing to "code you can modify" are open weights.

People are framing this as if it was an open-source hierarchy, with "actual" open-source requiring all training code to be shared. This is not obvious to me, as I'm not asking people that share open-source libraries to also share the tools they used to develop them. I'm also not asking them to share all the design documents/architecture discussion behind this software. It's sufficient that I can take the end result and reshape it in any way I desire.

This is coming from an LLM practitioner that finetunes models for a living; and this constant debate about open-source vs open-weights seems like a huge distraction vs the impact open-sourcing something like Llama has... this is truly a Linux-like moment. (at a much smaller scale of course, for now at least)

kemiller · on July 23, 2024

I dunno — if an open source project required, say, a proprietary compiler, that would diminish its open source-ness. But I agree it's not totally comparable, since the weights are not particularly analogous to machine code. We probably need a new term. Open Weights.

0-_-0 · on July 24, 2024

There are many "compilers", you can download The Pile yourself.