Location: NYC
Remote: Yes
Willing to relocate: No
Technologies: Linux, Cloud (AWS, Azure, and GCP), Kubernetes, Terraform, Python, Java (among many others)
Résumé/CV: upon request
Email: darose@darose.net
Strong, NYC-based technologist with extensive skills in Linux system administration, DevOps, Cloud operations, scalable systems, and large scale data processing seeking new senior-level Cloud Eng / Infra / DevOps role, ideally with a trading/fintech related company. My primary area of specialty recently is in DevOps and Cloud Computing, but I've also been a software engineer (Python and Java strongest), a team lead, and a VP of engineering. My experience ranges from several start-ups (including two with successful exits), to a quant hedge fund, as well as other trading/financial services firms. If interested, please email with details about role.
> Git has no such problem. There’s nothing it can’t do.
It can't display me the branch I'm working on, starting from the commit that began the branch, until the last commit. (Or if it can, I have no idea how to do it.)
This isn't a functionality issue, but rather a conceptual one. Git just fundamentally thinks of branches differently than I do. (And, maybe most humans?) To me, a branch begins ... when I branch it. To git, a branch begins ... at the very first commit ever in the repo, since they consider commits that were made before the branch point to still be part of the branch. The branch point then is just another commit to git - and not even one that they think is important enough to highlight. But without any ability to clearly identify the branch point, you can often find yourself looking through a forest of commits that are irrelevant to what you're really searching for.
I spent the better part of an hour the other day trying to figure out how to identify my branch point. (For the record, the best solution I eventually found was: cat .git/refs/heads/<branch name>.) Just my $0.02, of course, but IMO this is absurd - and a big feature gap / user friendliness issue with git compared to other VCS's.
IMO git is just another step in the evolution of VCS's, and not necessarily even one of the better ones. Its concepts, functionality, and feature set are focused primarily on distributed development and multiple people maintaining different source trees ... which is fine for the Linux kernel and other projects that heavily use that use case. But many/most projects don't work that way, and for them a centralized VCS is sufficient. I have no doubt that a better VCS will come along and replace git one day.
he is saying that finding <base-branch> is too hard. There is probably some magic with git merge-base or git show-branch but I don't know them well enough to do it
Also, that you'll need to pretty much completely reinvent yourself every 5 years or so if you want to stay employable. (I.e., need to keep becoming expert in some new tech specialty.)
This really strikes me as more of a marketing piece by Snowflake than a well-researched piece of reporting. The article mostly just quotes one person - Bob Muglia - who is, as they say on Wall Street, "talking his book" - i.e. giving an opinion that is not coincidentally in line with his own financial interests. Sure, Hadoop is getting old, and is quickly becoming replaced by spark. But loads of organizations have used, and continue to use hadoop /spark successfully. And the part about Kafka replacing Hadoop /Spark is just silly. They're completely different technologies, used for very different purposes, and many organizations use both side by side.
Disclaimer: I run a technology vendor partially invested in the success of the hadoop ecosystem.
You are making a very common mistake of coupling the file system and the map reduce implementation and the scheduler . You are right about this post and kafka though. Let me expand on this point a bit more.
Hadoop isn't what it was in 2004. It's now a complex beast with several decoupled components which now make it very hard to identify what "hadoop" is for people outside the space.
The hadoop ecosystem is actually very healthy if you actually look at all the streaming platforms built on top of it (kafka, apex, spark, flink,tez,..)
There are also databases such as hbase,cassandra and more recently kudu, specialized for different workloads. Don't even get me started on all the sql implementations (again with their own trade offs) such as impala and hive.
If we step back for a second here and focus on just the compute part: Yes map reduce is for the most part dead. This is supplanted by the streaming and batch platforms such as flink and spark.
The scheduler part (YARN) is competing with mesos largely in part thanks to spark and flink being able to leverage both with mesos being way more flexible. (Most hadoop distros only use YARN though).
Then we also have the distributed consensus part in zookeeper. Etcd is up and coming in this piece but your hadoop cluster uses zookeeper (both mesos and kafka rely on it for example)
The article also quotes Bobby Johnson, who helped run Facebook's Hadoop cluster, as well as the creator of Kafka (who ran Hadoop clusters at LinkedIn).
For what it's worth, all three of them seemed pretty down on Hadoop.
I think the parent is right though. Side topic but having seen how PR pieces are crafted, this feels like something that Snowflake put together and then passed on to datanami with a "we have a blog post we'd like you to publish" type mail. Claim is somewhat unsubstantiated but everything about it reeks of it trying to drive the person to discover of Snowflake at the start, and to think of it again at the end.
A quick search of hadoop against the snowflake domain and the term hadoop against the term snowflake, I keep finding that Snowflake has a definite Target in mind which is to convert hadoop users or people evaluating hadoop to choose them instead. They even have a webinar specifically for that segment of people.
Even further searching of Alex Woodie and mentions of snowflake show multiple articles with the CEO across multiple domains including datanami and Enterprise tech.
All that is circumstantial but I'm exercising a healthy bit of skepticism that this piece is pure research done by Alex Woodie. A little more objectively,
If I examine the "points" of the article, what I can see is:
Bob muglia has never met a happy hadoop customer. Mention couple of things that might replace hadoop in the future.
Bob muglia has only seen a few customers who've tamed hadoop.
Some discussions with and about Facebook's experience with hadoop painting hadoop as hard work from the outset.
More discussions with other tech folk (Kafka and data torrent). One is an alternative of sorts, and the other again discusses pain of hadoop.
And then back to Bob Muglia and who his target customers are for Snowflake - "hadoop refugees" - and his belief that we are in the valley of despair regarding hadoop.
Which brings us to the final mental point of the article. Ditch hadoop sooner rather than later, and here are the alternatives where the main one pushed from start to end is Snowflake.
I apologise if this was too far off the topic. I think the discussion of hadoop's validity or how it's being used is valid. I do also believe it's healthy to call out suspect stuff like this because the core of the article itself provides little to no critical value.
Dailymotion, the global video-hosting company, is looking to fill multiple roles to help us staff up a green-field project, building out a new ad-tech platform from the ground up. Hiring for multiple tech positions, including Front-end Engineer, Data Science, and Big Data Engineer, as well as more senior roles.
Dailymotion, the global video-hosting company, is looking to fill multiple roles to help us staff up a green-field project, building out a new ad-tech platform from the ground up. Hiring for multiple tech positions, including Front-end Engineer and Big Data Engineer, as well as more senior roles.