Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is by no means a perfect match for your requirements, but I'll share a CLI tool I built, called Dud[0]. At the least it may spur some ideas.

Dud is meant to be a companion to SCM (e.g. Git) for large files. I was turned off of Git LFS after a couple failed attempts at using it for data science work. DVC[1] is an improvement in many ways, but it has some rough edges and serious performance issues[2].

With Dud I focused on speed and simplicity. To your three points above:

1) Dud can comfortably track datasets in the 100s of GBs. In practice, the bottleneck is your disk I/O speed.

2) Dud checks out binaries as links by default, so it's super fast to switch between commits.

3) Dud includes a means to build data pipelines -- think Makefiles with less footguns. Dud can detect when outputs are up to date and skip executing a pipeline stage.

I hope this helps, and I'd be happy to chat about it.

[0]: https://github.com/kevin-hanselman/dud

[1]: https://dvc.org

[2]: https://github.com/kevin-hanselman/dud#concrete-differences-...



I'd be curious to see if you've tried git-annex, I use it instead of git-lfs when I need to manage big binary blobs. It does the same trick with a "check out" being a mere symlink.


I haven't used it, no. Around the time Git LFS was released, my read from the community was that Git LFS was favored to supersede git-annex, so I focused my time investigating Git LFS. Given that git-annex is still alive and well, I may have discounted it too quickly :) Maybe I'll revisit it in the future. Thanks for sharing!


Neither is favored, git-annex solves problems that git LFS doesn't even try to address (distributed big files), at the cost of extra complexity.

Git LFS is intended more for a centralized "big repo" workflow, git annex's canonical usage is as a personal distributed backup system, but both can stretch into other domains.

In this case git-annex seems to have a feature that git LFS doesn't have that would be useful to you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: