I'm not sure, hence the question. AFAIK temperature only comes into play at inference time once the distribution is known, but I don't know if there are other places where random numbers are involved.
Eg you tend to randomly shuffle your corpus to train on. If you use drop-out (https://en.wikipedia.org/wiki/Dilution_(neural_networks)) you use randomness. You might also randomly perturb your training data. Lots of other sources of randomness that you might want to try.