Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> We don’t have apples-to-apples benchmarks

We do: https://mlperf.org/

Just run their benchmarks. Submitting your results there is a bit more complicated, because all results there are "verified" by independent entities.

If you feel like your AI use case is not well represented by any of the MLPerf benchmarks, open a discussion thread about it, propose a new benchmark, etc.

The set of benchmarks there increases all the time to cover new applications. For example, on top of the MLPerf Training and MLPerf Inference benchmark suites, we now have a new MLPerf HPC suite to capture ML of very large models.



Those benchmarks are absurdly tuned to the hardware. Just look at the result Google gets with BERT on V100s vs the result NVIDIA gets with V100s. It's an interesting measurement of what experts can achieve when they modify their code to run on the hardware they understand well, but it isn't useful beyond that.


> Just look at the result Google gets with BERT on V100s vs the result NVIDIA gets with V100s.

These benchmarks measure the combination of hardware+software to solve a problem.

Google and NVIDIA are using the same hardware, but their software implementation is different.

---

The reason mlperf.org exists is to have a meaningful set of relevant practical ML problems that can be used to compare and improve hardware and software for ML.

For any piece of hardware, you can create an ML benchmark that's irrelevant in practice, but perform much better on that hardware than the competition. That's what we used to have before mlperf.org was a thing.

We shouldn't go back there.


> on top of the MLPerf Training and MLPerf Inference benchmark suites, we now have a new MLPerf HPC suite to capture ML of very large models.

I think the challenge is selecting the tests that best represent the typical ML/DL use cases for the M1 and comparing it to an alternative such as the V100 using a common toolchain like Tensorflow. One of the problems that I see is that the optimizer/codegen of the toolchain is a key component; the M1 has both GPU and Neural Engine and we don’t know which accelerator is targeted or even possibly both. Should we benchmark ML Create on M1 vs A14 or A12X? Perhaps it is my ignorance but I don’t think we are at a point where our existing benchmarks can be applied meaningfully with the M1 but I’m sure we will get there soon.


> The challenge is selecting the tests that best represent the typical ML/DL use cases for the M1 and comparing it to an alternative such as the V100 using a common toolchain like Tensorflow.

The benchmarks there are actual applications of ML, that people use to solve real world problems. To get a benchmark accepted you need to argue and convince people that the problem the benchmark solves must be solved by a lot of people, and that doing so burns enough cycles worldwide to be helpful to design ML hardware and software.

The hardware and software then gets developed to make solving these problems fast, which then in turns make real-world applications of ML fast.

Suggesting that the M1 is a solution, and now we just need to find a good problem that this solution solves well and add it there as a benchmark is the opposite to how mlperf works, and hardware vendors suggesting this is the reason mlperf exists. We already have common ML problems that a lot of people need to solve. Either the M1 is good at those or it isn't. If it isn't, it should become better at those. Being better at problems people don't want / need to solve does not help anybody.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: