It might be interesting to track average and peak server side memory usage during these tests. For example, if framework A is only 80% the "speed" of framework B, but uses 1/4 as much memory, for some users this might be a win for A.
This is definitely something we want to include in the tests, but it is difficult to manage a fair way of monitoring. Outside of the actual benchmarks, where we kept everything as fair as possible, we ran individual tests to make sure they worked prior to the benchmarking and would routinely have htop running at the same time to get an idea of what that sort of data looks like.
It IS very interesting and is definitely something we have discussed methods for measuring for the sake of these benchmarks. Other areas that we discussed were cpu utilization, disk utilization, and network saturation.
One idea that jumps to mind is putting the running frameworks in an LXC container (or similar) and monitoring the memory usage of that. Not sure how accurate that is, but it's one avenue. It also still might not be fair because some runtimes can let their heap get pretty big despite not actually needing the memory. You'd want a way to differentiate those from the ones with big heaps that would choke with some memory pressure.