I admit that test stress some corner cases (at least some cases that the allocator designer consider as corner cases). That said, malloc has no choice but supporting that use case.
A use case for such pattern is a message-posting with workers: you queue some messages that are later unqueued and processed by a different thread. This is an increasingly common pattern in modern programs. In that pattern the message is allocated in one thread (let say the main one) and processed then deallocated by another thread.
If your implementation of message allocation is malloc-based, then you will stress the exact same code paths the benchmark is stressing.
You're not wrong that malloc-based message passing causes that load on malloc, but if performance of the message-passing code is important, you'd want to use a ring buffer anyway - cross-CPU or not, malloc is pretty slow.
A use case for such pattern is a message-posting with workers: you queue some messages that are later unqueued and processed by a different thread. This is an increasingly common pattern in modern programs. In that pattern the message is allocated in one thread (let say the main one) and processed then deallocated by another thread.
If your implementation of message allocation is malloc-based, then you will stress the exact same code paths the benchmark is stressing.