OpenAI has a former NSA director on its board. [1] This connection makes the dilution of the term "PRISM" in search results a potential benefit to NSA interests.
A market maker needs a premium to provide liquidity. If all else is equal, why would they take on execution time risk? This is a universal feature of continuous-trading Central Limit Order Books (CLOBs), not something unique to prediction markets.
I’m referring specifically to the fundamental residual connection backbone that defines the transformer architecture (x_{l+1} = x_l + F(x_l)).
While the sub-modules differ (MHA vs GQA, SwiGLU vs GeLU, Mixture-of-Depths, etc.), the core signal propagation in Llama, Gemini, and Claude relies on that additive residual stream.
My point here is that DeepSeek's mHC challenges that fundamental additive assumption by introducing learnable weighted scaling factors to the residual path itself.
I guess I am asking how we know Gemini and Claude relies on the additive residual stream. We don't know the architecture details for these closed models?
That's a fair point. We don't have the weights or code for the closed models, so we can't be 100% certain.
However, transformer-based (which their technical reports confirm they are) implies the standard pre-norm/post-nnorm residual block structure. Without those additive residual connections, training networks of that depth (100+ layers) becomes difficult due to the vanishing gradient problem.
If they had solved deep signal propagation without residual streams, that would likely be a bigger architectural breakthrough than the model itself (akin to Mamba/SSMs). It’s a very high-confidence assumption, but you are right that it is still an assumption.
I have a similar experience when I found out that claude code can use ssh to conect to remote server and diagnose any sysadmin issue there. It just feels really empowered.
This argument falls apart when you look at Rust and Cargo. uv is literally trying to be "Python's Cargo." The entire blueprint came from a flagship FOSS project.
Rust's development used a structured, community RFC process—endless planning by your definition. The result was a famously well-designed toolchain that the entire community praises. FOSS didn't hold it back; it made it good.
So no, commercial backing isn't the only way to ship something good. FOSS is more than capable to ship great software when done right.
China is not the birthplace of so called '996'. Long before tech scene in China, there are a lot of investment banks doing that in HK especially for junior analysts. Calling 996 a China thing is just orientlalism. Everything bad is Chinese, everything good is western.
At least the recent popularity of the 996 originated in China, and I believe most Chinese people would agree with that. Besides, even if it started in Hong Kong, saying it originated in China is still technically correct.
China is the birthplace of the term 996. Of course it's not the birthplace of people being coerced into unhealthy work hours - that's been around for thousands of years.
There is probably little to nothing specifically Chinese about workaholism as a concept, but the word is definitely Chinese(as in language). Dialect continuum for East Asian languages are contained within borders, or in other words, each of the languages expanded and dominated to the full extents of continuum and hit with stagnation at major geographical features before entering the modern era.
[1]: https://openai.com/index/openai-appoints-retired-us-army-gen...
reply