The final layer will not encode high level concepts anymore, it's essentially ju...

The final layer will not encode high level concepts anymore, it's essentially just tokens from the vocabulary. It would be impossible to encode abstract things like "niceness" in it. As long as we don't know exactly at which layers this behaviour emerges, randomly choosing a subset also won't work. So what they did is apply a custom vector to every layer and let PCA figure out which of these vectors are actually necessary. Curiously, looking at these vectors should also tell you more about where and how the model processes these things.