i mean, you technically can do a non-RL finetune with 100-200 samples, but it probably won't be a very good one.
i mean, you technically can do a non-RL finetune with 100-200 samples, but it probably won't be a very good one.