You may also be getting a worse result for higher cost.
For a medical use case, we tested multiple Anthropic and OpenAI models as well as MedGemma. Pleasantly surprised when the LLM as Judge scored gpt5-mini as the clear winner. I don't think I would have considered using it for the specific use cases - assuming higher reasoning was necessary.
Still waiting on human evaluation to confirm the LLM Judge was correct.
That's interesting. Similarly, we found out that for very simple tasks the older Haiku models are interesting as they're cheaper than the latest Haiku models and often perform equally well.
You obviously know what you’re looking for better than me, but personally I’d want to see a narrative that made sense before accepting that a smaller model somehow just performs better, even if the benchmarks say so. There may be such an explanation, it feels very dicey without one.
Volume and statistical significance? I'm not sure what kind of narrative I would trust beyond the actual data.
It's the hard part of using LLMs and a mistake I think many people make. The only way to really understand or know is to have repeatable and consistent frameworks to validate your hypothesis (or in my case, have my hypothesis be proved wrong).
> when Obama deported 3M+ illegals, the sanctuary cities helped and the media didn't cover it like it was Nazi Germany
Perhaps it has to do with the method used?
I don't think most people are in favor of illegal immigration. That would be an extremist view. Most people are probably somewhere in the middle and I don't think that band is too wide.
It was different because the sanctuary cities turned over lists of the students whose parents weren't citizens. It made it a lot easier. Also no media coverage. You won't know if the tactics were any different because democrat cities actually complied with the law. Again, it goes to democrats being hypocrites and selectively caring/enforcing issues when they think people are paying attention.
>I don't think most people are in favor of illegal immigration. That would be an extremist view. Most people are probably somewhere in the middle and I don't think that band is too wide.
Yep, I live in California and we require immigration for a lot of jobs and industries. I'm not against immigration... in a past life I worked in immigration law. I am against this selective enforcement of the law depending on who is the President. You can't run a country like this.
You seem so fixated on communities not helping ICE to your satisfaction that you're blinded by ICE's atrocities. I guess your admission that you are an ICE employee or adjacent is enough explanation for that though.
I worked for immigrants (and the companies sponsoring immigrants).
That said I am nuanced individual and believe in legal immigration and I think Biden letting in tens of millions illegally and all these sanctuary cities was eventually going to create some kind of legal conflict. It's obvious that municipal law cannot supersede federal law.
I am super curious about this. I wonder baseline it needs to meet to pull me away from using ChatGPT or Claude.
My usage of it would be quite different than ChatGPT. I’d be much freer in what I ask it.
I think there’s a real opportunity for something like this. I would have thought Apple would have created it but they just announced they’ll use Gemini.
I think it's worth understanding why. Because that's not everyone's experience and there's a chance you could make a change such that you find it extremely useful.
There's a lesser chance that you're working on a code base that Claude Code just isn't capable of helping with.
I can relate to the author. I have an actively used GitHub repository which doesn't need any more features. It's got 1.3k stars but the commit history has been sparse.
Even I look at the last commit date as a proxy for "is this maintained". I wish GitHub had something similar to "Archived" for projects like these.
The reason for the intermediary is because the clickthrough sends the previous URL as a referer to the next server.
The only real way to avoid leaking specific urls from the source page to the arbitrary other server is to have an intermediary redirect like this.
All the big products put an intermediary for that reason, though many of them make it a user visible page of that says "you are leaving our product" versus Google mostly does it as an immediate redirect.
The copy/paste behavior is mostly an unfortunate side effect and not a deliberate feature of it.
Quoting web standards, you are more optimistic than I am, unfortunately, nobody uses them consistently or accurately (look at PUT vs POST for create / update as a really good example of this - nobody agrees) its a shame too, there's a lot of richness to the web spec. Most people don't even use "HEAD" to ensure they aren't making wasteful REST calls if they already have the data.
> All the big products put an intermediary for that reason
Surely whoever maintains the big products can add headers if they want?
And this is about people who care enough about not showing up in Referer headers to do something about it rather than people in general not understanding the full spec .
I worked on these big web products before and the answer then was that no, you couldn't trust it to be honored and it would have been considered a privacy incident so better off just having the redirect and having no risk. You can't trust the useragents for example.
Not sure if the reliability of the intentional mechanism has improved enough where this is just legacy or if there's entirely new reasons for it in 2026.
Referrer-Policy is a response header, so in this case it would be Google sending it, and the browsers who would be honouring it. You have to hope that the browser makers get it correct... Unless I misunderstood?
It sees periodic major updates to keep it in line with standards. That's not much more than maintenance mode, but it's more than just keeping the servers running. It seems like someone at Google pays attention to it and keeps it from falling behind, but I suspect the same was true of Google Reader until it wasn't.
>someone at Google pays attention to it and keeps it from falling behind
I feel like it's the same for Google My Maps. They even discontinued the Android app, so you can only use it on the web. It totally feels like there's a single guy keeping the whole system up.
reply