Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Academic research accelerates innovation, but it requires costly data that is out of reach for most academic teams.

This is true of pretty much any AI research. Look at Puffer[0], which was just on HN a couple of days ago. They're running a free streaming service just to get enough data to train their algorithms, and in fact mention in their FAQ that they would love to use commercial data if they could get it.

Unfortunately, academic and commercial incentives don't really align here. Most commercial entities don't want to share their data because it's valuable to them, and if they let researchers in, they want the output of the research to remain proprietary to their commercial enterprise.

I wonder if there isn't some sort of governance solution to this. Like give companies big tax breaks for sharing their data with researchers, or something like that. Essentially subsidize academia indirectly.

[0] https://puffer.stanford.edu/player/



I've seen semiconductor industry companies collaborate on grant-funding fundamental condensed-matter physics research. If it is a question of interest to all parties, and the work is too blue-sky to be immediately profitable, sometimes they'll fund the work.


> Unfortunately, academic and commercial incentives don't really align here. Most commercial entities don't want to share their data because it's valuable to them, and if they let researchers in, they want the output of the research to remain proprietary to their commercial enterprise... I wonder if there isn't some sort of governance solution to this.

You're commenting on an article in which a commercial entity is sharing their data despite it being valuable to them. Maybe they are the outlier but I've seen plenty of companies share data, especially in the ML space. Here are some datasets[0]. Maybe you would prefer more, but compared to other fields there is a lot of sharing. A "governance solution" could make things worse. If there was some mandate that companies that collect this data have to share it in a costly way, then it would discourage collection.

[0] https://blog.cambridgespark.com/50-free-machine-learning-dat...


> a commercial entity is sharing their data despite it being valuable to them

What's their incentive to share?


Likely they see self-driving as a complement to their core business of vehicle-passenger matching rather than something they hope to profit from directly in a meaningful manner, in which case "commoditize your complement" applies.

"A classic pattern in technology economics, identified by Joel Spolsky, is layers of the stack attempting to become monopolies while turning other layers into perfectly-competitive markets which are commoditized, in order to harvest most of the consumer surplus."

https://www.gwern.net/Complement


Excellent. Thank you


There's a few quick things that come to mind that may be possibilities:

1. If people use your dataset, they can do research into things relevant to you

2. Some people like working for companies that share data/code back with the community, helping hire & retain staff

3. Bits of publicity, either to potential engineers/researchers or others seeing lyft in a better light

4. Improvements in the domain, no matter where they come from, may be beneficial to your business


Your ideas pair well with what Austhrow743 said above. Self driving as a complementary endeavor so Lyft's core isn't jeopardized by releasing the data.


> I wonder if there isn't some sort of governance solution to this. Like give companies big tax breaks for sharing their data with researchers, or something like that. Essentially subsidize academia indirectly.

I think that's a really great idea. Not sure how many would take advantage of it but if it could be made to work then it would be really awesome.

It would also be extremely prone to abuse, though. Patenting is already an art of pretending to explain in clear terms what you are doing, while actually describing something as broadly and vaguely as possible. It would be pretty easy for a TON of things to leave out some key things that make it impossible or unhelpful to have the information.

You could form industry-specific regulations or even an active agency to prosecute abuses like that, but it would be immediately overwhelmed. The patent office is already heavily gamed by patent trolls, who bank on long odds for small judgements. Now imagine if millions or billions of dollars of taxes were on the line, and major companies were investing significant resources to open source while protecting their IP.

Even if that were all figured out, how would you value open sourcing stuff, even something as simple as data? Do you give breaks by size, importance, proportion of profit or future profit? Cost of the research? How do you guard against overvaluations and abuse of accounting? Even if you had perfectly accurate, annually-updated solutions for all that, companies can still game the system. Lyft has decided this dataset is what they need; if they could get a bigger break by collecting more data, they'd do that. Plus- facebook and google release tons of open source stuff. Do they deserve more than say, pharmaceutical research?[1]

Similar (IIRC Nixon) tax breaks already exist for R&D, and they are a notoriously abused loophole. Simplified but illustrative example: you build your R&D lab in the shape of a factory, do your research for a while and then suddenly scale back and replace it with machinery- well, the original building was still deducted from taxes.

Pharma is actually a perfect example. It's a well known fact that R&D only accounts for 22% of pharma industry revenue (almost equal to advertising at 19%), but only ~30% of that actually goes to new drugs. The rest takes advantage of marketing and the patent system to re-release drugs that are essentially the same. Two thirds of their research is obvious changes that are only protected because they owned the original patent- those shouldn't be getting the benefit of incentives.


Slightly tangential but might this be another argument for people "owning" their data while companies "own" the processing procedures of it. If people "owned" their data it would presumably be much easier for them to give it out for research purposes


AI ( neural networks) is sadly a "winner takes it all" markt...

Even if you create an algorithm five times better and faster, you still lack the data to feed it..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: