Hacker Newsnew | past | comments | ask | show | jobs | submit | dnautics's commentslogin

> You know there are people at NRO who are dedicated to ship tracking via satellite.

I feel like there must be people at NRO whi are dedicated to sub tracking via satellite.


it isn't

Austin is not even that good at "keeping law and order" (marginally better than say SF). It's really just "build more housing".

I mean they're on thin ice. I have no love for the trump administration, but It's not clear where the constitution authorizes congress to fund VoA or any government propaganda arm, really. You could equally make the argument "it took three generations to unpick and fix the FDR bullshit".

Seems like this would fall under the General Welfare clause. Certainly a lot clearer than the idea that growing your own feed for your livestock means you don’t buy feed on the open market which affects prices which means your feed growing qualifies as interstate commerce despite never leaving your land and can therefore be regulated by the federal government.

whataboutism. Both of those are bad.

You’re the one who brought up FDR, my dude. It’s not whataboutism to address a topic explicitly mentioned by the other party.

I personally don’t see the case for saying the Constitution doesn’t authorize government funded media under the General Welfare clause. I can see where there might be room for disagreement, but that clause is pretty broad. Whereas interstate commerce is a lot clearer, and the reasoning in Wickard v. Filburn is pretty transparent bullshit made to reach a desired conclusion. In terms of Congress exceeding its enumerated powers, the latter is vastly worse.


> You’re the one who brought up FDR, my dude

VOA was established by FDR?

> the General Welfare clause.

the general welfare clause is pretty clearly in the header of article 1 section 8, with the specific enumerated welfare provisions following it.

anything outside of those provisions (e.g. making a non-apporpotioned income tax, making a law that makes alcohol illegal, why not just have those be authorized as "general welfare"?) needs a Constitutional amendment otherwise its just congress creating a power for itself out of whole cloth.

look at this point the government pretty clearly breaks every other part of the Constitution except for the pomp and circumstances around the oath of office, so we know where the real priorities lie.


> VOA was established by FDR?

Are you serious? You mentioned FDR by name.

> You could equally make the argument "it took three generations to unpick and fix the FDR bullshit".

You made the comparison. I made my own comparison. Then you accused me of "whataboutism" for following your lead.

I don't have the energy for this bullshit.


read carefully.

"it took three generations to unpick and fix the FDR bullshit [establishing VOA]".


Ah, so "the FDR bullshit" was specifically about VOA. That was not clear. I thought it was a reference to New Deal policies more broadly. Write more carefully.

Actually, in your reply in the other subtree, you added "(among other things)" so it appears I understood it correctly the first time, and it was a reference to New Deal policies more broadly.


yes and the comment i originally responded to talks about Trump/DOGE in a broad sense, which is also more than just VOA. actually doge is unwinding quite a bit of FDR bullshit it turns out so my analogy is a nice mirror.

> it took three generations to unpick and fix the FDR bullshit

The difference is that the New Deal is one of, like, 3 reasons the US isn't a complete shithole. Basically everything you like about the US can be traced back to the New Deal.


> Basically everything you like about the US can be traced back to the New Deal.

like, the fucked up "worst of both both worlds socialism and capitalism" employment-tied healthcare system we have, arguably the #1 biggest problem in contemporary us?

they dont teach history good in this country any more i guess


Tone doesnt come over well in text. What exactly is the "FDR Bullshit"?

establishing VOA (among other things)

minor nitpicks:

ETS is not a process that responds to messages, you have to wrap it in a process and do the messages part yourself.

Process dictionary: i am pretty sure that's a process_info bif that directly queries the vm internal database and not a secret message that can be trapped or even uses the normal message passing system.


> ETS is not a process that responds to messages, you have to wrap it in a process and do the messages part yourself.

I didn't say it's implemented as a process but works as if it where logically. Most terms (except literals and the binary references) are still copied just like when you send message. You could replace it behind the scenes with a process and it would act the same. Performance-wise it won't be the same, and that's why they are implemented differently but it doesn't allow sharing a process heap and you don't have to do locks and mutexes to protect access to this "shared" data.

> i am pretty sure that's a process_info bif that directly queries the vm internal database and not a secret message that can be trapped or even uses the normal message passing system.

I specifically meant querying the dictionary of another process. Since it's in the context of "erlang is violating the shared nothing" comment. In that case if we look at https://www.erlang.org/doc/system/ref_man_processes.html#rec... we see that process_info_request is a signal. A process is sent a signal, and then it gets its dictionary entries and replies (note the difference between messages and signals there).


ah sorry you did indeed write signal and somehow my brain read it as "message". I'll stand by my ets comments though because it would be confusing to think of ets as not having to be wrapped in a process (for lifetimes if nothing else)

> I'll stand by my ets comments though because it would be confusing to think of ets as not having to be wrapped in a process (for lifetimes if nothing else)

The point was that "ets" doesn't break isolation and doesn't logically behave any differently than as if inside it was wrapping a process and every lookup or update was behind the scenes sending a message to the process to read its data and returning a response. So the original poster's claim that somehow defaults to "sharing memory" are simply untrue. They just don't understand how it works, which is fine. I suspect they don't really use it that much and just found whatever they could by Googling it (pun intended as they are a Technical Program Manager).

Also, here is Robert Virding, one of Erlang creators, explaining it a lot better https://stackoverflow.com/a/1483875:

> ETS more or less behaves as if the table was in a separate process and requests are messages sent to that process. While it is not implemented with processes that the properties of ETS are modeled like that. It is in fact possible to implement ETS with processes. This means that the side effect properties are consistent with the rest of Erlang.


in ten years of BEAM ive written a deadlock once. and zero times in prod.

id say its better to default to call instead of pushing people to use cast because it won't lock.


Generally agree, all the problems i’ve had with erlang have been related to full mailboxes or having one process type handling too many kinds of different messages etc.

These are manageable, but i really really stress and soak test my releases (max possible load / redline for 48+ hours) before they go out and since doing that things have been fairly fine, you can usually spot such issues in your metrics by doing that


> ETS can be mentally modeled as a process that owns the table (even though the implementation is not)

the API models it that way, so i'd say its a bit more than just a mental model.


not all races are bugs. here's an example that probably happens in many systems that people just don't notice: sometimes you don't care and, say, having database setup race against setup of another service that needs the database means that in 99% of cases you get a faster bootup and in 1% of cases the database setup is slow and the dependent server gets restarted by your application supervisor and connects on the second try.

not just. huge deposits opened (actively being exploited) up in colorado, utah in the past few years and Minnesota this year

the paper is burying the lede here (i think?)

> The key technical unlock is to restrict lookup heads to head dimension 2, which enables a decoding path where the dominant retrieval/update operations can be computed in log time in the sequence length (for this structured executor regime), rather than by a full prefix-sized attention sweep.

edit: i understand how hullkv works now. very clever.

I dont understand why this strategy is applicable only to "code tokens"

lastly, im not sure why wasm is a good target, iirc wasm seems to be really inefficient (not so much in code but in expressivity). i wonder if that curtails the llms ability to plan higher order stuff (since its always forced to think in the small)


> i have a pretty good understanding of how transformers work but this did not make sense to me. also i dont understand why this strategy is applicable only to "code tokens"

Yes, there is a monstrous lack of detail here and you should be skeptical about most of the article claims. The language is also IMO non-standard (serious people don't talk about self-attention as lookup tables anymore, that was never a good analogy in the first place) and no good work would just use language to express this, there would also be a simple equation showing the typical scaled dot-product attention formula, and then e.g. some dimension notation/details indicating which matrix (or inserted projection matrix) got some dimension of two somewhere, otherwise, the claims are inscrutable (EDIT: see edit below).

There are also no training details or loss function details, both of which would be necessary (and almost certainly highly novel) to make this kind of thing end-to-end trainable, which is another red flag.

EDIT: The key line seems to be around:

    gate, val = ff_in(x).chunk(2, dim=-1)
and related code, plus the lines "Notice: d_model = 36 with n_heads = 18 gives exactly 2D per head" but, again, this is very unclear and non-standard.

Treating attention as a lookup operation is popular among computational complexity theorists (e.g. https://arxiv.org/abs/2310.03817 ) because it's easier to work with when you're explicitly constructing a transformer to perform a particular computation, just to demonstrate that tranformers can, in theory, perform it. That's also why there are no training details: the weights are computed directly and not trained.

This is a good link and important (albeit niche) qualification.

It is hard to square with the article's claims about differentiability and otherwise lack of clarity / obscurantism about what they are really doing here (they really are just compiling / encoding a simple computer / VM into a slightly-modified transformer, which, while cool, is really not what they make it sound like at all).


> lookup tables anymore, that was never a good analogy in the first place

good analogy otherwise, wasn't hash tables the motivation for the kv tables?


Well, one can never be sure what the real motivation for a lot of DL advances, as most papers are post-hoc obscurantism / hand-waving or even just outright nonsense (see: internal covariate shift explanations for batch norm, which arguably couldn't be more wrong https://arxiv.org/pdf/1805.11604).

When you really get into this stuff, you tend to see the real motivations as either e.g. kernel smoothing (see comments / discussion at https://news.ycombinator.com/item?id=46357675#46359160) or as encoding correlations / feature similarities / multiplicative interactions (see e.g. broad discussion at https://news.ycombinator.com/item?id=46523887). IMO most insights in LLM architectures and layers tends to come from intuitions about projections, manifolds, dimensionality, smoothing/regularization, overparameterization, matrix conditioning, manifold curvature and etc.

There are almost zero useful understandings or insights to be gained from the lookup-table analogy, and most statistical explanations in papers are also post-hoc and require assumptions (convergence rates, infinite layers, etc) that are never shown to clearly hold for actual models that people use. Obviously these AI models work very well for a lot of tasks, but our understanding of why they do is incredibly poor and simplistic, for the most part.

Of course, this is just IMO, and you can see some people in the linked threads do seem to find the lookup table analogies useful. I doubt such people have spent much time building novel architectures, experimenting with different layers, or training such models.


yeah now that I think about this, I think the hull-kv will not scale to comprehension beyond ~simple computational tasks.

The buried lede is this, if you have two dimensions and use rope, and hard-max attention you could simply store addresses as a given theta. With RoPE and sufficient precision that pretty easily gets you relative addressing with just one head and absolute with three (treating BOS as a sink getting rotation relative to it with orthogonal unit queries then using the result to counter rotate your own relative position with the complex conjugate). With less precision just add a few more heads with different thetas.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: