While you get good accuracy using techniques like this, its debatable how useful or robust this general approach is - because you aren't really measuring the quality of the essay, so much as you are trying to find features that just happen to be predictive of the quality. Certainly, it would seem fairly easy for future students to game, if such a system was deployed.
I'm not able to dig up the name, but there's a named effect in statistics (especially social-science statistics) describing exactly that. When you find a correlate of a desired outcome that has predictive value, a common result if you then set the correlate as a metric is that a substantial part of the correlation and predictive value quickly disappears, because you've now given people incentives to effectively arbitrage the proxy measure. You've said, I'm going to treat easy-to-measure property A as a proxy for what-I-really-want property B. Now there is a market incentive to find the cheapest possible way to maximize property A, which often ends up being via loopholes that do not maximize property B. A heuristic explanation is that proxies that are easier to measure than the "real" thing are also easier to optimize than the real thing. At the very least, your original statistics aren't valid anymore, because you measured in the context where people were not explicitly trying to optimize for A, but now they are doing so, so you need to re-measure to check if this changed the data.
Aha, almost it; I was thinking of the very similar Campbell's law, which your mention of Goodhart's law led me to. Somehow no combination of search terms got me to either of those when I was trying to come up with the name, though...
I'm not able to dig up the name, but there's a named effect in statistics (especially social-science statistics) describing exactly that. When you find a correlate of a desired outcome that has predictive value, a common result if you then set the correlate as a metric is that a substantial part of the correlation and predictive value quickly disappears, because you've now given people incentives to effectively arbitrage the proxy measure. You've said, I'm going to treat easy-to-measure property A as a proxy for what-I-really-want property B. Now there is a market incentive to find the cheapest possible way to maximize property A, which often ends up being via loopholes that do not maximize property B. A heuristic explanation is that proxies that are easier to measure than the "real" thing are also easier to optimize than the real thing. At the very least, your original statistics aren't valid anymore, because you measured in the context where people were not explicitly trying to optimize for A, but now they are doing so, so you need to re-measure to check if this changed the data.