Updating Torrents via DHT Mutable Items

niftich · on Aug 9, 2016

This is an intriguing addition to Bittorrent; this kind of feature exists in IPFS [1], but of course Bittorrent has a much, much larger install base, so this progressive enhancement could bring similar or near-equivalent features to a wider audience.

This implementation relies on BEP 44 (Storing arbitrary data in the DHT) [2], which, together with the design of the DHT, does have some security implications [3].

[1] https://ipfs.io/

[2] http://www.bittorrent.org/beps/bep_0044.html

[3] https://gist.github.com/substack/eadd13302d785dc13aac#file-r...

lkjhgfdsa57 · on Aug 10, 2016

Seems to fit the same usecase as Freenet USK keys: https://wiki.freenetproject.org/Updateable_Subspace_Key

They allow the holder of the private key to add new data and clients who know the public USK can get notified of updates. In Freenet you can also get the old versions of the data for as long as they stay around on the network.

niftich · on Aug 11, 2016

Freenet is remarkably well thought-out and solves a lot of these difficult issues like censorship resistance, deniable encryption, distributed data storage, and mutability; but it doesn't get much press these days.

aninhumer · on Aug 10, 2016

Indeed, I'm surprised it's taken this long for people to take advantage of bittorrent DHT for other p2p stuff.

You don't even need to change the protocol to start using it. It's a ready made hash-to-peerlist mapping, so you could just use it for peer-discovery/bootstrapping and do other stuff out-of-band.

rakoo · on Aug 11, 2016

There already was a BEP to exchange arbitrary stuff, BEP44, for mutable or immutable items. It's taken this long to use this BEP specifically for mutable torrents, though, because it turned out that not many people had a need for mutable torrents.

aninhumer · on Aug 11, 2016

Is this BEP44 actually implemented in most clients? I was experimenting with p2p some years ago, and I considered extending the protocol, but I figured it would be more practical to work out-of-band, rather than relying on widespread implementation of an extension (which didn't seem that likely to me at the time).

Mizza · on Aug 9, 2016

IPFS still doesn't have a content discovery mechanism, which I found very disappointing. It's not a killer app without that.

niftich · on Aug 9, 2016

Well, to be fair, neither does HTTP, short of either name-services (DNS), or crawling-and-indexing.

IFPS has IPNS [1], the name services (the feature most-directly relevant to the Bittorrent enhancement in question).

To date, I'm unaware of an IFPS crawler-and-indexer search engine.

[1] https://github.com/ipfs/faq/issues/16

Mizza · on Aug 9, 2016

Why is that fair? They're designed for different purposes. IPFS falls far short of being useful in a transformative way because it lacks this important feature. Otherwise, it'll be a kind of gimmicky alternative to HTTP, "auto-archiving HTTP", that never got used for anything real.

nine_k · on Aug 9, 2016

Did you ever use the web in 1995?

Content discovery is neither a part of HTTP nor the architecture of the web. It's a feature if it's current landscape, built on top of existing non-ideal ALI and brute force (scanning).

Good web search (represented by Google) appeared around 1998, years after the general availability of the web, when the corpus of web pages was already large. Up to this moment, search is powered by ad revenue.

I don't see how IPFS is significantly different in this regard.

Mizza · on Aug 10, 2016

It isn't 1995, this is exactly my point. If we're trying to design new systems, we shouldn't design them with the exact same problems. What's the point of "decentralizing" if it just means another Google?

nine_k · on Aug 10, 2016

The approach of yacy.net may be partly applicable to IPFS.

Donating resources to a "traditional" scanning search engine is also probably doable. But unlike Web, IPFS lacks intense linking and thus "citation ranking" (PageRank-like). Measuring relevance is harder.

niftich · on Aug 9, 2016

I don't disagree that having content discovery would be nice, but IFPS makes a pretty good case [1] about what it brings to the table over HTTP. My point was that HTTP also has "zero out-of-the-box support" for content discovery, yet once search engines came out, we were fine.

[1] https://ipfs.io/#why

brownhats · on Aug 10, 2016

If by fine you mean a centralized search engine is a good solution for a decentralized network, then yeah.

viraptor · on Aug 9, 2016

I don't really believe the same network can carry its metadata. (attached directly to the content) I mean, in an ideal world it would be great, but in practice how would that work? You can copy anyone's metadata to the file you published. How can anyone tell a difference? (unless you put some kind of signed ranking system on top of that, but that's just another layer waiting to be abused - like kazaa's star ratings)

It's basically the reason search engines stopped trusting meta tags on the websites.

bobajeff · on Aug 10, 2016

What sort of mechanism are you referring to? How are such things implemented in other technologies?

Mizza · on Aug 9, 2016

This is an okay idea, but it doesn't really solve the big problem facing BT right now: content discovery.

Mutable Items are good for updating content, but that doesn't matter if nobody can find it.

There needs to be a BEP for a way to host, serve and search metadata about torrents, not just their info-hashes. This should be priority #1 for BT devs. Fortunately, there is at least some related thinking in this direction: http://www.bittorrent.org/beps/bep_0044.html although I don't think that it really goes far enough, as there is no standardization around how the data for searching will be structured, etc.

icebraining · on Aug 9, 2016

Tribler is a research project of Delft University of Technology. (...) Tribler is the first client which continuously improves upon the aging BitTorrent protocol from 2001 and addresses its flaws. We expanded it with, amongst others, streaming from magnet links, keyword search for content, channels and reputation-management. All these features are implemented in a completely distributed manner, not relying on any centralized component. Still, Tribler manages to remain fully backwards compatible with BitTorrent.

https://www.tribler.org/about.html

Mizza · on Aug 10, 2016

I worked on an anonymous P2P research project many many years ago now, and we were shocked when we saw how much funding they received compared to us (3 mil EUR!) - but I'm super happy to see that their project is still alive and kicking! I guess the money was put to good use.

sktrdie · on Aug 9, 2016

Torrent sites could publish their database dumps via this extension, and consumers would automatically download their updated index. A torrent can be anything; it can also be a list of other torrents ;) So it does help with content discovery.

Mizza · on Aug 9, 2016

You don't see a problem with that..?

You still need to have the entry point - that's exactly where the censorship is occurring. Try to find a copy of the KickAssTorrents database on Google right now and tell me how easy it is.

The approach you're talking about is essentially what http://bitcannon.io/ is trying to do - but that requires running a separate app, a local MongoDB server, downloading many gigabytes of database data, and half an hour to import the torrent DB into your local instance, and that STILL requires you to be able to acquire the Torrent database or related infohash, which isn't easy because of the very censorship we're trying to sidestep.

Quite frankly, that's a really dumb solution to this censorship and it's naive to say that it's useful. This will never be used in the fashion you're describing and it does nothing for anti-censorship.

It makes so much more sense just to have the DHT be actually useful in surfacing content.

wyager · on Aug 10, 2016

> You still need to have the entry point

Which will now be a short hexadecimal code that is easily shared and can't be taken down.

sktrdie · on Aug 10, 2016

If KickAss decides to share their dump using their public key, it becomes much harder to censor because of how the DHT system works. Authorities would need to shut down all the nodes that republish that public key on the DHT, which is incredibly hard to do. Much harder than asking ISPs to shutdown a domain name in DNS.

cocotino · on Aug 9, 2016

An index? That could be huge. I think it's senseless.

So many p2p protocols have support for p2p search (to my mind comes Kademlia), why can't BitTorrent have p2p search?

niftich · on Aug 9, 2016

The Mainline DHT (the more-supported DHT vs. Azureus/Vuze DHT) traditionally only contains (iirc, hashes of) torrent infohases mapped to peer lists. With BEP 44 (arbitrary data in DHT) or some other enhancement, this may change, but right now even if you traverse the DHT there's no "interesting" metadata about the keys (or, y'know, the actual content of the .torrent file) inside the DHT.

From the sibling comment (posted by swolchok)'s paper, a relevant quote:

"We chose to demonstrate our proof of concept only on Vuze due to the significant additional complexity of supporting two DHTs in the crawler. While Mainline does not contain torrent descriptions, its peer lists are keyed by the torrent infohashes themselves, so no additional machinery would be needed to discover infohashes. However, some method would be needed to discover torrent names, such as downloading .torrent files directly from peers using the metadata exchange protocol.

To explore this possibility, we built a prototype .torrent crawler. Since we are operating in the context of the Vuze DHT, our DHT crawler needs to obtain both a torrent description and a peer list before the .torrent crawler can contact the appropriate peers with the correct infohash. However, the .torrent files contain useful additional data, including the full listing of files contained in the torrent"

swolchok · on Aug 9, 2016

Vuze (formerly known as Azureus) has (or at least had a prototype of) p2p search.

I wrote a paper on building a torrent search index quickly using the DHT records that this feature emitted at the time. http://static.usenix.org/events/woot10/tech/full_papers/Wolc...

sktrdie · on Aug 9, 2016

Great work with this. Are you still in academia doing research?

xorcist · on Aug 10, 2016

Kademlia search is also rather useless as it is flooded with false results. This is of interest not only to malware writers but also as a service to sell to rights holders.

aninhumer · on Aug 10, 2016

I remember someone setting up a bittorrent search engine based on DHT scraping. It's probably been taken down, but I think it was called DHT Dig, and there's apparently a library with that name.

Something like that would make it easy to create centralised search engines for casual users, and allow power users to run local databases.

rakoo · on Aug 11, 2016

There was btdigg at some point (https://btdigg.org/) but for some reason it's down today. The approach is not particularly complicated, there even were papers about that:

* In Vuze: https://www.usenix.org/legacy/event/woot10/tech/full_papers/..., slides at https://www.defcon.org/images/defcon-18/dc-18-presentations/...

* In Mainline: https://arxiv.org/ftp/arxiv/papers/1009/1009.3681.pdf

Basically all you have to do is sit there, people will come to you asking for content, you'll ask as a standard DHT node but note that the infohash exists, and get more information (name, number of peers, ...) about it in the background. All you need is a few IPs.

aninhumer · on Aug 11, 2016

Ah yes, BTDigg was the name.

I'd imagine to do more thorough scraping you'd want to reconnect to the network repeatedly with different peer hashes, otherwise you'd only see the infohashes "near" one point in the network.

rakoo · on Aug 11, 2016

Yep, that's the idea: if you join the DHT network with multpile, far away ids, you get more requests coming in. Unfortunately there is BEP 42 (http://www.bittorrent.org/beps/bep_0042.html) which restricts the number of IDs you can have on the same IP (the restriction is there for a good reason, so it's a good think to follow it).

eximius · on Aug 9, 2016

Wow, the first thing I think of is that this could be a fantastic vector for malware. Anyone who implements this should make it very clear the underlying data can change.

DarkLinkXXXX · on Aug 10, 2016

To be fair, torrents were always a fantastic vector for malware.

pessimizer · on Aug 10, 2016

To be fair, the internet has always been a fantastic vector for malware.

jokoon · on Aug 10, 2016

To be fair, human cells has always been a fantastic vector for infection.

pepijndevos · on Aug 10, 2016

At least now you can hope that popular torrents with positive comments are maybe okay. With this the author could simply add the malware after it has become popular.

Senji · on Aug 10, 2016

Not a problem. The tracker will mark the torrent as changed from its previous version.

rakoo · on Aug 11, 2016

You don't stumble randomly on such a mutable torrent. If you know git's data model, usual torrents are like commits, and those mutable torrents are like branches: If you give someone a commit there is trust that it won't move, and if you give them a branch name, there is expectation that the actual commit pointed to will change.

beefield · on Aug 10, 2016

So, what is stopping this to be the ultimate distributed social media platform?

I mean, I should be able to publish a relatively simple sqlite database that contains:

- my public posts

- my public encryption key

- list of my friends

- my private posts encrypted with the friends' public keys

Then I should seed at least my and my friends databases, so even if some of my friend is offline, his/hers database is still available.

rakoo · on Aug 11, 2016

Twister (https://en.wikipedia.org/wiki/Twister_(software)) partly uses bittorrent, for long-term storage of posts and messages. It's not purely standard bittorrent, though.

niftich · on Aug 10, 2016

Network effects and lack of off-the-shelf implementations.

beefield · on Aug 10, 2016

Network effects are clearly a major issue, but wrapping the concept to a webpage and/or mobile app should not be a major hurdle. If I have not missed something.

If I only had some more time on my hands...

niftich · on Aug 11, 2016

Although the subject matter is different, GitTorrent [1] actually uses techniques you propose, by using BEP 44 to store Git revisions in the Bittorrent Mainline DHT, and Bitcoin to store cryptographically-signed usernames in the Bitcoin Blockchain.

You can look at how that implementation was done and what issues they encountered to see what it would take to implement a distributed [something], where that [something] in your case is a social network.

[1] https://github.com/cjb/GitTorrent