I also didn't follow that part. Their step 2 seem to be a general-purpose bot detection strategy that works independently of their step 1 ("randomly mention companies").
That was my first thought too -- but then why would the bot company care about a few false positives?
I suppose it could have an impact if 30% of all, say, Coca Cola mentions on the web came from that site, but then it would have to be a very big site. I don't think the bot company would notice, let alone care, if it was 0.01% of the mentions.
They dont want to feed their model with garbage data, or this data is read and revieved by real humans
I remember years-ago (2008?) I worked in a company where every mention of it was manually reviewed by someone from PR department.
I imagine now the tools are even better.
Different thing is that discussion is often very low quality (forums died for multiple reasons, reddit is dying too - astro-turf gallore now)