Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The total strikes me as small. That's nearly two decades of contributions from several 100k active members, and a few million total. HN is what would have been a substantial social network prior to Facebook, and (largely on account of its modest size and active moderation) a high-value one.

I did some modelling of how much contributed text data there was on Google+ as that site was shutting down in 2019.

By "text data", I'm excluding both media (images, audio, video), and all the extraneous page throw-weight (HTML scaffolding, CSS, JS).

Given the very low participation rates, and finding that posts on average ran about 120 characters (I strongly suspect that much activity was part of a Twitter-oriented social strategy, though it's possible that SocMed posts just trend short), seven years' of history from a few tens of millions of active accounts (out of > 4 billion registered profiles) only amounted to a few GiB.

This has a bearing on a few other aspects:

- The Archive Team (AT, working with, but unaffiliated with, the Internet Archive, IA) was engaged in an archival effort aimed at G+. That had ... mixed success (much content was archived, one heck of a lot wasn't, very few comments survive (threads were curtailed to the most recent ten or so, absent search it remains fairly useless, those with "vanity accounts" (based on a selected account name rather than a random hash) prove to be even less accessible). In addition to all of that, by scraping full pages and attempting to present the site as it presented online, AT/IA are committing to a tremendous increase in the stored data requirements whilst missing much of what actually made the site actually of interest.

- Those interested in storing text contributions of even large populations face very modest storage requirements. If, say, average online time is 45 minutes daily, typing speed is 45 wpm, and only half of online time is spent writing vs. reading, that's roughly 1,000 words/(person*day), or about 6 KiB/(person*day). That's 6 MiB per 1,000 people, 6 GiB per 1 million, 6 PiB per billion. And ... the true values are almost certainly far lower: I'm pretty certain I've overstated writing time (it's likely closer to 10%), and typing speed (typing on mobile is likely closer to 20--30 wpm, if that). E.g., Facebook sees about 2.45 billion "pieces of content" posted per day, of which half is video. If we assume 120 characters (bytes) per post, that's a surprisingly modest amount, substantially less than 300 GiB/day of text data. (Images, audio, and video will of course inflate that markedly).

- The amount of non-entered data (e.g., location, video, online interactions, commerce) is the bulk of current data collection / surveillance state & capitalism systems.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: