This is an intriguing addition to Bittorrent; this kind of feature exists in IPFS [1], but of course Bittorrent has a much, much larger install base, so this progressive enhancement could bring similar or near-equivalent features to a wider audience.
This implementation relies on BEP 44 (Storing arbitrary data in the DHT) [2], which, together with the design of the DHT, does have some security implications [3].
They allow the holder of the private key to add new data and clients who know the public USK can get notified of updates. In Freenet you can also get the old versions of the data for as long as they stay around on the network.
Freenet is remarkably well thought-out and solves a lot of these difficult issues like censorship resistance, deniable encryption, distributed data storage, and mutability; but it doesn't get much press these days.
Indeed, I'm surprised it's taken this long for people to take advantage of bittorrent DHT for other p2p stuff.
You don't even need to change the protocol to start using it. It's a ready made hash-to-peerlist mapping, so you could just use it for peer-discovery/bootstrapping and do other stuff out-of-band.
There already was a BEP to exchange arbitrary stuff, BEP44, for mutable or immutable items. It's taken this long to use this BEP specifically for mutable torrents, though, because it turned out that not many people had a need for mutable torrents.
Is this BEP44 actually implemented in most clients? I was experimenting with p2p some years ago, and I considered extending the protocol, but I figured it would be more practical to work out-of-band, rather than relying on widespread implementation of an extension (which didn't seem that likely to me at the time).
Why is that fair? They're designed for different purposes. IPFS falls far short of being useful in a transformative way because it lacks this important feature. Otherwise, it'll be a kind of gimmicky alternative to HTTP, "auto-archiving HTTP", that never got used for anything real.
Content discovery is neither a part of HTTP nor the architecture of the web. It's a feature if it's current landscape, built on top of existing non-ideal ALI and brute force (scanning).
Good web search (represented by Google) appeared around 1998, years after the general availability of the web, when the corpus of web pages was already large. Up to this moment, search is powered by ad revenue.
I don't see how IPFS is significantly different in this regard.
It isn't 1995, this is exactly my point. If we're trying to design new systems, we shouldn't design them with the exact same problems. What's the point of "decentralizing" if it just means another Google?
The approach of yacy.net may be partly applicable to IPFS.
Donating resources to a "traditional" scanning search engine is also probably doable. But unlike Web, IPFS lacks intense linking and thus "citation ranking" (PageRank-like). Measuring relevance is harder.
I don't disagree that having content discovery would be nice, but IFPS makes a pretty good case [1] about what it brings to the table over HTTP. My point was that HTTP also has "zero out-of-the-box support" for content discovery, yet once search engines came out, we were fine.
I don't really believe the same network can carry its metadata. (attached directly to the content) I mean, in an ideal world it would be great, but in practice how would that work? You can copy anyone's metadata to the file you published. How can anyone tell a difference? (unless you put some kind of signed ranking system on top of that, but that's just another layer waiting to be abused - like kazaa's star ratings)
It's basically the reason search engines stopped trusting meta tags on the websites.
This is an okay idea, but it doesn't really solve the big problem facing BT right now: content discovery.
Mutable Items are good for updating content, but that doesn't matter if nobody can find it.
There needs to be a BEP for a way to host, serve and search metadata about torrents, not just their info-hashes. This should be priority #1 for BT devs. Fortunately, there is at least some related thinking in this direction: http://www.bittorrent.org/beps/bep_0044.html although I don't think that it really goes far enough, as there is no standardization around how the data for searching will be structured, etc.
Tribler is a research project of Delft University of Technology. (...) Tribler is the first client which continuously improves upon the aging BitTorrent protocol from 2001 and addresses its flaws. We expanded it with, amongst others, streaming from magnet links, keyword search for content, channels and reputation-management. All these features are implemented in a completely distributed manner, not relying on any centralized component. Still, Tribler manages to remain fully backwards compatible with BitTorrent.
I worked on an anonymous P2P research project many many years ago now, and we were shocked when we saw how much funding they received compared to us (3 mil EUR!) - but I'm super happy to see that their project is still alive and kicking! I guess the money was put to good use.
Torrent sites could publish their database dumps via this extension, and consumers would automatically download their updated index. A torrent can be anything; it can also be a list of other torrents ;) So it does help with content discovery.
You still need to have the entry point - that's exactly where the censorship is occurring. Try to find a copy of the KickAssTorrents database on Google right now and tell me how easy it is.
The approach you're talking about is essentially what http://bitcannon.io/ is trying to do - but that requires running a separate app, a local MongoDB server, downloading many gigabytes of database data, and half an hour to import the torrent DB into your local instance, and that STILL requires you to be able to acquire the Torrent database or related infohash, which isn't easy because of the very censorship we're trying to sidestep.
Quite frankly, that's a really dumb solution to this censorship and it's naive to say that it's useful. This will never be used in the fashion you're describing and it does nothing for anti-censorship.
It makes so much more sense just to have the DHT be actually useful in surfacing content.
If KickAss decides to share their dump using their public key, it becomes much harder to censor because of how the DHT system works. Authorities would need to shut down all the nodes that republish that public key on the DHT, which is incredibly hard to do. Much harder than asking ISPs to shutdown a domain name in DNS.
The Mainline DHT (the more-supported DHT vs. Azureus/Vuze DHT) traditionally only contains (iirc, hashes of) torrent infohases mapped to peer lists. With BEP 44 (arbitrary data in DHT) or some other enhancement, this may change, but right now even if you traverse the DHT there's no "interesting" metadata about the keys (or, y'know, the actual content of the .torrent file) inside the DHT.
From the sibling comment (posted by swolchok)'s paper, a relevant quote:
"We chose to demonstrate our proof of concept only on Vuze due to the significant additional complexity of supporting two DHTs in the crawler. While Mainline does not contain torrent descriptions, its peer lists are keyed by the torrent infohashes themselves, so no additional machinery would be needed to discover infohashes. However, some method would be needed to discover torrent names, such as downloading .torrent files directly from peers using the metadata exchange protocol.
To explore this possibility, we built a prototype .torrent crawler. Since we are operating in the context of the Vuze DHT, our DHT crawler needs to obtain both a torrent description and a peer list before the .torrent crawler can contact the appropriate peers with the correct infohash. However, the .torrent files contain useful additional data, including the full listing of files contained in the torrent"
Kademlia search is also rather useless as it is flooded with false results. This is of interest not only to malware writers but also as a service to sell to rights holders.
I remember someone setting up a bittorrent search engine based on DHT scraping. It's probably been taken down, but I think it was called DHT Dig, and there's apparently a library with that name.
Something like that would make it easy to create centralised search engines for casual users, and allow power users to run local databases.
There was btdigg at some point (https://btdigg.org/) but for some reason it's down today. The approach is not particularly complicated, there even were papers about that:
Basically all you have to do is sit there, people will come to you asking for content, you'll ask as a standard DHT node but note that the infohash exists, and get more information (name, number of peers, ...) about it in the background. All you need is a few IPs.
I'd imagine to do more thorough scraping you'd want to reconnect to the network repeatedly with different peer hashes, otherwise you'd only see the infohashes "near" one point in the network.
Yep, that's the idea: if you join the DHT network with multpile, far away ids, you get more requests coming in. Unfortunately there is BEP 42 (http://www.bittorrent.org/beps/bep_0042.html) which restricts the number of IDs you can have on the same IP (the restriction is there for a good reason, so it's a good think to follow it).
Wow, the first thing I think of is that this could be a fantastic vector for malware. Anyone who implements this should make it very clear the underlying data can change.
At least now you can hope that popular torrents with positive comments are maybe okay. With this the author could simply add the malware after it has become popular.
You don't stumble randomly on such a mutable torrent. If you know git's data model, usual torrents are like commits, and those mutable torrents are like branches: If you give someone a commit there is trust that it won't move, and if you give them a branch name, there is expectation that the actual commit pointed to will change.
Network effects are clearly a major issue, but wrapping the concept to a webpage and/or mobile app should not be a major hurdle. If I have not missed something.
Although the subject matter is different, GitTorrent [1] actually uses techniques you propose, by using BEP 44 to store Git revisions in the Bittorrent Mainline DHT, and Bitcoin to store cryptographically-signed usernames in the Bitcoin Blockchain.
You can look at how that implementation was done and what issues they encountered to see what it would take to implement a distributed [something], where that [something] in your case is a social network.
This implementation relies on BEP 44 (Storing arbitrary data in the DHT) [2], which, together with the design of the DHT, does have some security implications [3].
[1] https://ipfs.io/
[2] http://www.bittorrent.org/beps/bep_0044.html
[3] https://gist.github.com/substack/eadd13302d785dc13aac#file-r...