Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> * They removed the cloudsearch syntax from the search API.

Didn't know that. That sucks. Is the massive dataset of reddit posts still on BigTable?




> They are not owned by Reddit.

You sure about that? I would bet that if reddit requested google take down the data, google would comply.

If not Reddit, then who owns it?


The individual users own the content they create. The Reddit TOS gives Reddit a broad license to use it but the content itself is owned by the users.


So who owns the dataset aggregated from all users?


More importantly, did the people running analysis on this dataset get explicit permission from the European post authors as required by the GDPR?


I mean the BQ datasets aren't managed by Reddit, they're sourced from pushshift, so Reddit wouldn't immediately take them down.

Whether or not Reddit could file a DMCA complaint to force it down is up for debate.


Showing the error "Unable to find table: fh-bigquery:reddit_comments.2015_05" https://bigquery.cloud.google.com/table/fh-bigquery:reddit_c...


You can just get the canonical dataset directly at data.pushshift.io. That's the origin of the BigQuery-hosted version.


Wow. To be fair, lots of apps/sites scrape viral content from reddit. I can see the argument for why reddit would want to limit that access, but in reality it's a fool's errand in the same realm as Twitter restricting API access.


I'm not seeing an error; you may have to display the `fh_bigquery` project first.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: