I'm not from that generation so that's a bit hard for me to understand. Even if you used a closed-source C compiler, wouldn't you still have been able to look at the header file, which would have been pretty self-explanatory?
And surely if you bought a C compiler, you would have gotten a manual or two with it? Documentation from the pre-Internet age tended to be much better than today.
Yeah - but you have to be a good enough programmer to really understand the headers.. the 'bootstrapping' problem was real :-) Especially if you didn't live in a metropolitan/college area. My local library was really short on programming books - especially anything 'in depth'. Also, 'C' was considered a "professional's language" back then - so bookstores/libraries were more likely to have books on BASIC then 'C'
Surely it's more of a spectrum? From a CPU, to a TPU, to a chip that hardwires softmax attention but lets you store arbitrary weights, to one that hardwires the weights directly.
The first professional commercial 4K camera came out over 23 years ago, and the first smartphones and camcorders capable of 4K video were back in 2013.
The Macbook Neo has a 2.5x higher multi-core Geekbench score compared to the i7-4960X's, the top consumer CPU of 2013 (which could handle 4K video editing in h264), and its single-core performance is 5x higher. Plus, I'm 99% sure the MacBook Neo has a dedicated video decoding ASIC anyway.
Fine-tuning still makes sense for cost/latency-sensitive applications. Massive context windows drastically slow down generation, and modern models' performance and instruction following ability relies heavily on a reasoning step that can consume orders of magnitude more tokens than the actual response (depending on the application), while a fine-tuned model can skip/significantly reduce that step.
Using the large model to generate synthetic data offline with the techniques you mentioned, then fine-tuning the small model on it, is an underrated technique.
You don't need 8GB of RAM or less to have memory issues. Cursor + Claude Code + Slack + Discord + Spotify + a few Docker containers + YouTube and a few browser tabs is enough to overwhelm a MacBook with 24GB of RAM.
Right now on my machine, 5 whole docker containers, including two DBs and 3 dev servers, are taking up less RAM than Cursor, a glorified text editor.
And have you looked at RAM prices lately? It's possible that 8GB is all some people can afford.
Wtf? Once it was AI. Then the models started passing the Turing test and calling themselves AI, so we started using AGI to say "truly intelligent machines". Now, as per the definition you quoted, apparently even GPT-3 is AGI, so we now have to use "ASI" to mean "intelligent, but artificial"?
I think I'll just keep using AI and then explain to anyone who uses that term that there is no "I" in today's LLMs, and they shouldn't use this term for some years at least. And that when they can, we will have a big problem.
LLMs are artificial intelligence illusion engines, they only "reason" as far as there's an already made answer in their dataset that they can retrieve and eventually tweak (when things go best). Take them where there's no training data and give them the new axioms to solve your specific problem and see them fail with incorrect gibberish provided as confident answer. Humans of any level of intelligence wouldn't behave like that.
Tensorflow is largely dead, it’s been years since I’ve seen a new repo use it. Go with Jax if you want a PyTorch alternative that can have better performance for certain scenarios.
You can actually generate surprisingly coherent text with minimal finetuning of BERT, by reinterpreting it as a diffusion model: https://nathan.rs/posts/roberta-diffusion/
I don’t see a useful definition of LLM that doesn’t include BERT, especially given its historical importance. 340M parameters is only “small” in the sense that a baby whale is small.
E.g.
void qsort(void* base, size_t nmemb, size_t size, int (compar)(const void , const void* ));
And surely if you bought a C compiler, you would have gotten a manual or two with it? Documentation from the pre-Internet age tended to be much better than today.
reply