More

mstaoru · 2026-02-28T22:59:05 1772319545

I periodically try to run these models on my MBP M3 Max 128G (which I bought with a mind to run local AI). I have a certain deep research question (in a field that is deeply familiar to me) that I ask when I want to gauge model's knowledge.

So far Opus 4.6 and Gemini Pro are very satisfactory, producing great answers fairly fast. Gemini is very fast at 30-50 sec, Opus is very detailed and comes at about 2-3 minutes.

Today I ran the question against local qwen3.5:35b-a3b - it puffed for 45 (!) minutes, produced a very generic answer with errors, and made my laptop sound like it's going to take off any moment.

Wonder what am I doing wrong?.. How am I supposed to use this for any agentic coding on a large enough codebase? It will take days (and a 3M Peltor X5A) to produce anything useful.

lm28469 · 2026-02-28T23:14:21 1772320461

> Wonder what am I doing wrong?

You're comparing 100b parameters open models running on a consumer laptop VS private models with at the very least 1t parameters running on racks of bleeding edge professional gpus

Local agentic coding is closer to "shit me the boiler plate for an android app" not "deep research questions", especially on your machine

aspenmartin · 2026-02-28T23:08:48 1772320128

Well Opus and Gemini are probably running on multiple H200 equivalents, maybe multiple hundreds of thousands of dollars of inference equipment. Local models are inherently inferior; even the best Mac that money can buy will never hold a candle to latest generation Nvidia inference hardware, and the local models, even the largest, are still not quite at the frontier. The ones you can plausibly run on a laptop (where "plausible" really is "45 minutes and making my laptop sound like it is going to take off at any moment". Like they said -- you're getting sonnet 4.5 performance which is 2 generations ago; speaking from experience opus 4.6 is night and day compared to sonnet 4.5

zozbot234 · 2026-02-28T23:15:20 1772320520

> Well Opus and Gemini are probably running on multiple H200 equivalents, maybe multiple hundreds of thousands of dollars of inference equipment.

But if you've got that kind of equipment, you aren't using it to support a single user. It gets the best utilization by running very large batches with massive parallelism across GPUs, so you're going to do that. There is such a thing as a useful middle ground. that may not give you the absolute best in performance but will be found broadly acceptable and still be quite viable for a home lab.

zozbot234 · 2026-02-28T23:01:57 1772319717

Running local AI models on a laptop is a weird choice. The Mini and especially the Studio form factor will have better cooling, lower prices for comparable specs and a much higher ceiling in performance and memory capacity.

stavros · 2026-02-28T23:09:49 1772320189

I can never see the point, though. Performance isn't anywhere near Opus, and even that gets confused following instructions or making tool calls in demanding scenarios. Open weights models are just light years behind.

I really, really want open weights models to be great, but I've been disappointed with them. I don't even run them locally, I try them from providers, but they're never as good as even the current Sonnet.

andoando · 2026-02-28T23:18:45 1772320725

They're great for some product use cases where you dont need frontier models.

stavros · 2026-02-28T23:21:40 1772320900

Yeah, for sure, I just don't have many of those. For example, the only use I have for Haiku is for summarizing webpages, or Sonnet for coding something after Opus produces a very detailed plan.

Maybe I should try local models for home automation, Qwen must be great at that.

lm28469 · 2026-02-28T23:17:28 1772320648

They're like 6 months away on most benchmarks, people already claimed coding wad solved 6 months ago, so which is it? The current version is the baseline that solves everything but as soon as the new version is out it becomes utter trash and barely usable

zozbot234 · 2026-02-28T23:21:46 1772320906

That's very large models at full quantization though. Stuff that will crawl even on a decent homelab, despite being largely MoE based and even quantization-aware, hence reducing the amount and size of active parameters.

stavros · 2026-02-28T23:20:01 1772320801

That's just a straw man. Each frontier model version is better than the previous one, and I use it for harder and harder things, so I have very little use for a version that's six months behind. Maybe for simple scripts they're great, but for a personal assistant bot, even Opus 4.6 isn't as good as I'd like.

notreallya · 2026-02-28T23:04:33 1772319873

Sonnet 4.5 level isn't Opus 4.6 level, simple as

culi · 2026-02-28T23:08:31 1772320111

Well you can't run Gemini Pro or Opus 4.6 locally so are you comparing a locally run model to cloud platforms?

furyofantares · 2026-02-28T23:08:53 1772320133

Can you try asking Sonnet 4.5 the same question, since that is what this model is claimed to be on par with?

mstaoru · 2026-02-28T08:34:10 1772267650

Every now and then I will google "books like Hyperion", read something, and conclude that it was nothing like Hyperion. Wonderful books, wonderful writer. A loss.

mstaoru · 2026-02-27T19:29:39 1772220579

Health insurance? Remote role taxed where? Would it even count for Portuguese PR?

Spent €10k on visa consultants?.. What did they do exactly?

I wouldn't also fully assume that Portugal is more stable than Turkey. EU is not what it used to be 10 years ago.

...

I would stay in Turkey.

Slaboli · 2026-02-27T21:07:02 1772226422

Thanks for the advice! The position is taxed in Portugal. Tt's a US company with a local base, (paying 1100 monthly lol) so everything is legally compliant regarding taxes and residency. I spent 10k EUR in total, including my stays in Lisbon, just to handle the paperwork (it took six months for them to process my residency card)

That is precious insight about stability. You're saying Portugal, while in a better position, isn't significantly more stable than Turkey in the long run. For EU, I agree that it probably won't improve much over the long term.

mstaoru · 2026-02-21T11:16:40 1771672600

I guess it says something about OAuth when you implement it "at scale" and still have multiple misconceptions (all very common though).

Most importantly, OAuth is an authorization framework, OIDC is an authentication extension built on top.

Refresh tokens are part of authorization, not authentication.

HTTP header is Authorization: Bearer..., not Authentication.

There's no such thing as "HMAC encryption", it's a message authentication code. RSA in OAuth is also typically used for signing, not encryption. Not much "encryption" encryption going on in OAuth overall TBH.

Nonce and client IDs are not "salts", but ok that's nitpicking :)

reactordev · 2026-02-21T13:53:03 1771681983

Baby steps my guy, baby steps. Yes, I don’t even mention OIDC, but I think the way I explained it was the middle schoolers version we all can understand (even if there are some minor mistakes in nomenclature).

The point I was trying to make at 2am is that it’s not scary or super advanced stuff and that you can get away with OAuth-like (as so many do). But yes, OAuth is authorization, OIDC is authentication. The refresh token is an authorization but it makes sense to people who have never done it to think of it as a “post-login marker”.

mstaoru · 2026-02-01T21:55:41 1769982941

We moved into a new flat with really bad lighting and I decided to buy those "AmazeFun" (or whatever generic named CN brand) "smart" LED ceiling lights. Bought one for each of four rooms.

Installed, tested them with the app, everything works, great!

Got out the remotes since pulling out the phone to use the app every time you want to turn on the light in the room is a bit much for me. Pressed Power, boom, the whole house is powered on. Dimmer, light temperature, everything syncs between all four lights. Power off turns them all off.

Wrote to "AmazeFun" support, turns out it's "normal behavior". Right.

paradox460 · 2026-02-02T00:46:31 1769993191

Fwiw, get bulbs that run something like wled. You can pair them with esp-now remotes, like the wiiz remote

https://www.athom.tech/blank-1/wled-15w-color-bulb

mstaoru · 2026-02-01T14:21:42 1769955702

Headscale is good. We're using to manage a two isolated networks of about 400 devices each. It just works. It's in China so official Tailscale DERPs do not work, but enabling built-in DERP was very easy.

mstaoru · 2025-11-28T09:18:31 1764321511

In a similar vein, but different concept - https://hazeover.com/ - dimming inactive windows.

mstaoru · 2025-11-21T09:08:35 1763716115

Social media (FB, Twitter, Instagram...) - never had any, never felt the need.

TikTok etc - I strongly believe these are brain cancer.

Crypto - never had any need or interest.

Smartwatches - never solved any need for me either. Same with tablets.

Apple ecosystem - I have a Macbook but all the other stuff IMHO is pretty bad or I don't need it.

Pokemon - no interest.

Home IoT - despite working for many years in (commercial) IoT, home IoT never clicked for me, it's all really clunky and useless, at least in my experience.

VR - we have Quest 3 but rarely play it, it's just not fun somehow after the initial novelty wears off, I much prefer PS5.

mstaoru · 2025-11-02T11:05:55 1762081555

I brought my WiFi 7-capable ASUS RT-BE96U to Germany (from China) and I proudly notice that my average download speed is up to ~105 Mbit from ~95 Mbit with the stock Vodafone router.

"Silicon Valley of Europe", my a*s.

mstaoru · 2025-10-06T11:27:19 1759750039

> Big Tech is spending $364B on infrastructure instead of fixing the code

You mean CrowdStrike still crashes? Spotlight still writes 26TB every night? (Which only happened in beta, AFAIK...) Of course, they are fixing the code. Conflating infrastructure spending is not helpful.

The bitter truth is that complex software will always contain some bugs, it's close to impossible to ship a completely mathematically perfect software. It's how we react to bugs and the report/fix/update pipeline that truly matters.