maybe make a metric of "count letter $L in work $word" - if you want to game it,...

kadushka · on Oct 24, 2024

Which model would you recommend to try it on? Would you train it from scratch, or finetune an existing one?

mattnewton · on Oct 24, 2024

You would have to train the new model from scratch since it would be all new token embeddings with whatever character encoding scheme you come up with. It would probably make sense to train the vanilla gpt from scratch with the same total embeddings size as your control. I would start with https://github.com/karpathy/nanoGPT as a baseline since you can train a toy (GPT2 sized) llm in a couple days on an a100 which are pretty easy to come by.