Well congrats to the team for the exit. But I am really hoping that this will co...

bratao · on July 8, 2020

One more good experience. I created an cluster of dedicated servers ( 64 cores, 6TB of SSD storage and 256 GB of RAM, 1 GPU) using Rancher, for about 250 Euros/Month. This would cost at least 2k in a cloud such as AWS. There is a post about how I did with persistent storage here (https://medium.com/@bratao/state-of-persistent-storage-in-k8...)

It really transformed my company DevOps. I´m VERY happy. If you can, use Rancher. It is just perfect!

zumachase · on July 8, 2020

We're in the same camp with a cluster ~2x as large for Squawk[1] and it would cost us many multiples in the cloud (excluding our TURN relays which aren't k8s). However, the one killer feature that the cloud still has over self hosted is the state layer. There is nothing that comes close to the turn key, highly available, point in time recoverable database offerings from the cloud providers. We're running Spilo/Patroni helm charts, and we've really tried to break our setup chaos monkey style. But I'll admit I'd sleep better leaving it in Amazon's hands (fortunately, with all the money we save, we have multiple synchronous replicas and ship log files every 10 seconds).

[1] Shamless plug Squawk: Walkie Talkie for Teams - https://www.squawk.to

_EDIT_ I've just read your blog post. We went the other direction and have used the local storage provisioner to create PVCs directly on host storage, and push the replication to the application layer. We run postgres and redis (keydb) with 3 replicas each with at least one in sync replication (where supported) and shipping postgres wal logs to S3 every 10 seconds.

manigandham · on July 12, 2020

You can also try databases that are natively distributed with replication and scaling built-in. If you need SQL you have many "newSQL" choices like CockroachDB, Yugabyte, Vitess, TiDB, and others.

vorpalhex · on July 8, 2020

Why did you keep your TURN relays out of k8s?

zumachase · on July 8, 2020

Because we needed geographic distribution so that we don't end up hairpinning our users, and they only run a single service so the value prop is much lower. We use route 53 to do geodns across a number of cheap instances around the world (which is also nice, let's you pick regions with cheap bandwidth but good latency to major metro areas). We currently have TURN relays in Las Vegas, New York, and Amsterdam and that gives us pretty good coverage (sorry Asia...you're just so damn expensive!).

But all of our APIs sit in one k8s cluster across two datacenters (Hetzner, with whom we couldn't be happier).

GordonS · on July 9, 2020

Really interested in hosting at Hetzner, as their prices are fantastic by comparison to AWS, Azure & GCP.

I'm particularly interested in what an HA Postgres setup might look like. Assuming you are running some kind of database (whether Postgres or otherwise), what are you doing for persistent storage? Are you using Hetzner's cloud block storage volumes? What is performance like?

jb_gericke · on July 9, 2020

Interesting! Is that a single K8 control plane across one cluster? We've gone with fully isolated clusters across 2 data centers to protect against a network isolation incident between them causing a split brain/borking etcd.

zumachase · on July 9, 2020

Yes the control plane is only in one of the data centers. The other only runs admin services like offsite backups, our development infra (gitlab, etc) and CI/CD.

We could definitely do two clusters and probably should, but the secondary data center has few services that it wasn’t really worth the extra work.

jb_gericke · on July 10, 2020

Oh cool, interesting. Thanks for the overview

smartbit · on July 8, 2020

Longhorn synchronously replicates the volume across multiple replicas stored on multiple nodes https://github.com/longhorn/longhorn

At first look the numbers in the colourful table near the end, Piraeus/Linstor/DRBD seems 10x faster than Longhorn 0.8. The article goes into great depth of the (a)synchronous replication options of Piraeus, but doesn't mention that Longhorn always does synchronous replication. I wonder why?

SUSE being full into btrfs and CEPH, I wonder if they will allow Yasker https://github.com/longhorn/longhorn/graphs/contributors to continue developing. At Kubecon EU & US 2019 https://youtu.be/hvVnfZf9V6o?t=1659 Sheng Yang explains how he tried to make Longhorn first class citizen Kubernetes Storage.

darren0 · on July 8, 2020

Longhorn serves a very difference use case than btrfs and CEPH so continued investment makes sense.

Disclaimer: I'm the Rancher Labs CTO

merb · on July 8, 2020

drbd is really really hard to use. (ceph aswell tough)

also performance is extremly dependant on so many factors which are not always given. i.e. drives, network, etc.

for some stuff even a distributed fs is enough, like glusterfs

bratao · on July 9, 2020

I should have made it clearer that Longhorn is sync as default. Linstor is also synchronous as default, but you can mess with it to make async in some situations (In reality you allow it to be out-of-sync).

I´m really rooting for Longhorn. I´m a sucker for GUIs. But in my tests the performance is not there yet.

However, they opened a new epic ticket to focus on performance, and hopefully they will keep improving Longhorn after the acquisition.

GordonS · on July 9, 2020

You mentioned somewhere that your servers were hosted with Hetzner - are you using their "cloud volume" block storage? Really curious to know what performance is like with this cloud attached SSD storage!

battery423 · on July 8, 2020

Thats a great depiction of the power of one person with proper knowledge!

Get a little bit of money (in comparision to all those shiny great things), build it, wing it and provide a huge benefit :)

0xbadcafebee · on July 8, 2020

Agreed, Rancher rocks.

Keyframe · on July 8, 2020

Where did you rent dedicated servers?

bratao · on July 9, 2020

Hetzner

lambdasquirrel · on July 8, 2020

Yeah I would say similarly. My team is working with Rancher and found their permissions management to be a solid selling point, among other things. And you can terraform 99% of the things you need.

api · on July 8, 2020

One thing that continuously irks me about K8S is that the bar is so high. Does it really need to be so complex? Does it really need so much mandatory complexity?

Is that complexity needed or do more complex things actually tend to win in certain markets because nerds like knobs?

ojhughes · on July 8, 2020

Distributed cloud computing is complex, k8s provides a solid abstraction based on decoupled reconciliation loops that work together in a common control plane. One of the most compelling facets of k8s is this declarative and extensible architecture.

The collaboration between Service -> Deployment -> ReplicaSet -> Pod -> Container is a great example of how these reconcilers work together.

Yes, it has a lot of knobs and dials but you don't need to understand them to get going. Just pick up something like skaffold.dev and you can be productive very quickly

leonardteo · on July 8, 2020

Actually K8S itself as a standard is not complex/hard. If you are a developer and user/consumer of K8S, use it! If the cluster is managed by someone else, K8S is great.

It only gets complex when you have to provision & manage your own clusters. That's where Rancher really shines, as it makes it so much simpler to deploy and manage K8s everywhere.

api · on July 8, 2020

I place provisioning and management of your own clusters in a category I call "installability" or "deployability." It's a fundamental category of UX especially for technical and infrastructure applications.

I once tried to deploy a minimal test instance of OpenStack. Granted this was years ago, but I have been doing Linux since 1993 and I could not get it to run. That's an example of absolutely horrible UX at the deployability level.

K8S is nowhere near that bad but it definitely seems much harder than it needs to be to provision a basic default configuration for a working cluster.

mugsie · on July 8, 2020

K8S is a lot easier than OpenStack to install, but when comparing something like Rancher to OpenStack, it should be compared to something like OpenStack Ansible, or a vendor version of OpenStack (RIP HPE Helion) which were a lot easier than apt-get install openstack is.

K8S has a lot less moving parts - a couple of binaries / containers and etcd. The issue start coming up when you go beyond the single control plane node, and want a HA API.

jen20 · on July 8, 2020

Why not compare it to a instead to a contemporary competitor - Nomad - which has simplicity as a core value? It has _far_ fewer moving parts than Kubernetes.

mugsie · on July 8, 2020

I was talking about the GP comparing their OpenStack install experience to something like Rancher, which is not an apples to apples comparison.

- on a side note - OpenStack and Kubernetes are not competitors, they are quite complementary collections of applications, that both have their place in a modern open source infrastructure.

wwright · on July 8, 2020

My experience with it has always been that it is delightfully simple for the task at hand. There's a wide surface area because it covers a wide problem space (distributed computing), but any individual task has always felt simple and very thoughtfully considered for me.

doteka · on July 8, 2020

Eh, I’d say k8s with the help of Helm is about as simple as it can get to deploy and manage large clusters of networked applications. The equivalent done using e.g. Ansible playbooks would be far more complex.

If the complexity seems too much, it’s probably a sign you don’t need k8s.

BossingAround · on July 8, 2020

You can use Docker Swarm. Mirantis since back-paddled on its plans to deprecate it. It's a great piece of SW if you don't need thousands of containers, but rather low hundreds (and if you don't need additional stuff like Istio, operators, etc.)