Backblaze is such a weird case. On one hand, it became the most trusted personal backup provider on reddit and HN, on another - their software is absolute junk, and as some comments in this thread are highlighting - even their restore can't be trusted.
I've never needed to restore anything, so can't say anything about this, but once, one of my devices deleted a file in Syncthing, and I went into Backblaze to see if they have any logs of deletions/file modifications (had it disabled in syncthing).
I don't remember the exact details, but I remember clearly that I felt like the entire thing was done by a junior engineer straight out of college. Trying to understand the names of some variables used there, I stumbled upon a reddit thread where the person who worked on the client was trying to explain why things were done the way they were - and I felt like it was me in my first 3 months of software engineering.
How did Backblaze gain this trust in the first place? Is it because nobody is offering "unlimited" storage at the same price point?
> How did Backblaze gain this trust in the first place? Is it because nobody is offering "unlimited" storage at the same price point?
Yes, the unlimited storage is one factor. Their detailed write ups about hard drive reliability and transparency on how they build their racks (I think they essentially opened sourced the design) established a lot of credibility as well. Plus in my experience they just worked.
I paid for Backblaze for years but finally cancelled when I junked my last desktop and never got around to installing it on my laptop. I did use their restore functionality a couple of times and it was slow and kind of clunky but it worked.
I’m sad to hear they’ve started dropping stuff from the backup like this. I’ve been contemplating signing back up but most of the stuff I care about is in iCloud or OneDrive so if they aren’t backing that up, it’s pretty useless to me.
I was very surprised when I gave them a go after hearing good stuff about them here and on other techy parts of the internet:
After using their website and their app for a few hours I pretty much immediately decided to not proceed with them as the software was clearly not built by a team that has great competency in software development. This was a year ago so they've had plenty of time to polish it.
You sort of let that kind of stuff pass for a hardware company, but backblaze is not a hardware company. There's more to backup than just ensuring the disks at the data centre are replaced in a timely manner.
When I first started using Backblaze, I also tested several of the competitors at the time (Carbonite, CrashPlan etc) and as bad as Backblazes software was, it was still by far the best out of the options.
And from comments here I don't see any other user-friendly options being proposed, it's all just suggestions to glue together some open source software with object storage and be your own sysadmin.
That's the CTO. And yes, it has been evident for years that backblaze doesn't actually know how to write software, they are unfamiliar with many basic things.
I think opus does in fact, find the bugs the same way GPT xhigh (or even high) does. It just discards them before presenting to the user.
Opus is designed to be lazy, corner-cutting model. Reviews are just one place where this shows. In my orchestration loop, opus discards many findings by GPT 5.4 xhigh, justifying this as pragmatism. Opus YAGNIs everything, GPT wants you to consider seismic events in your todo list app. There's sadly, nothing in between.
> A trivial example is that you can improve performance in a very simple way: ask "are you sure?" showing the model what it intends to do, BEFORE doing it. Improves performance by 10%
Put it into the "are you sure" loop and you'll see the model will just keep oscillating for eternity. If you ask the model to question the output, it will take it as an instruction that it must, even if the output is correct.
Not in my experience. I mean, it happens. But models can check if their own function calls are reasonable. And that doesn't require dropping the context cache, so it's a lot less expensive than you would probably initially think.
necro-posting here, but that's kinda what we're working on! We're focused on creating cloud workspaces for sandboxed coding agents and it's built to support any agent harness. https://www.amika.dev/
I've been using the OpenAI Agents SDK for a while now and am largely happy with the abstractions - handoffs/sub-agents, tools, guardrails, structured output, etc. Building the infra and observability, and having it scale reliably, was a bigger pain for me. So I do get Anthropic's move into managed agents.
Agreed, but it's a bit nuanced. I'm working on a fairly complex project now in a domain where I have no technical experience. The first iteration of the project was complete garbage, but it was garbage mainly because I asked for things to be done and never asked HOW it should be done. Result? Complete, utter garbage. It kinda, sorta worked, but again, I would never use it in anything important.
Then we went through ~10 complete rewrites based on the learnings from previous attempts. As we went through these iterations, I became much more knowledgeable of the domain - because I saw failure points, I read the resulting code and because I asked the right questions.
Without AI, I would likely have given up after iteration 2, and certainly would not do 10 iterations.
So the nuance here is that iterating and throwing away the entire thing is going to become much cheaper, but not without an engineer being in the loop, asking the right questions.
Note: each iteration went through dual reviews of codex and opus at each phase with every finding fixed and review saying everything is perfect, the best thing on earth.
I'm seeing similar process but on large teams still finding this output to be unmaintainable.
The problem is that vanishingly few people actually understand the code and are asking the agents to do all of the interpretation and reasoning for them.
This code that you've built is only maintainable for as long as you are still around at the company to work on it -- it's essentially a codebase that you're the only domain expert in. That's not a good outcome for companies either.
My prediction is that the companies that learn this lesson are the ones that are going to stick around. LLMs won't be in wide use for features but for throwaway busy-work type problems that eat lots of human resources and can't be ignored.
I left my last company job just before "AI-first engineering" became mainstream, and you confirmed what I was feeling all this time - I have absolutely zero idea how teams actually manage to collaborate with LLM-managed projects. All the projects that I'm working now are my own and the only reason why I could do this is because I had unlimited time and unlimited freedom. There's no chance I would be able to do this in a team setting.
I'm positive that the last company's CEO probably mandates by now that nobody must write a single line of code by hand and there's likely some rigid process everyone has to follow.
I agree and commiserate. In the near term my picture is pretty grim. There's fantastic uses for these tools but they're being abused.
I was big on correctness, software safety (think medical devices, not memory) and formal proofs anyway, so I think I'm just going to take the pay cut and start selecting for those types of jobs. Your run of the mill SaaS or open source+commercial companies are all becoming a death march.
The day I start freaking out about my job is the day when my non-engineer friend turned vibe coder understands how, or why the thing that AI wrote works. Or why something doesn't work exactly the way he envisioned and what does it take to get it there.
If it can replace SWEs, then there's no reason why it can't replace say, a lawyer, or any other job for that matter. If it can't, then SWE is fine. If it can - well, we're all fucked either way.
> If it can replace SWEs, then there's no reason why it can't replace say, a lawyer
SWE is unique in that for part of the job it's possible to set up automated verification for correct output - so you can train a model to be better at it. I don't think that exists in law or even most other work.
What is the automated verification of correct output and who defines that?
But before verification, what IS correct output?
I understand SWE process is unique in that there are some automations that verify some inputs and outputs, but this reasoning falls into the same fallacies that we've had before AI era. First one that comes to mind is that 100% code coverage in tests means that software is perfect.
Right, and that's why it's only part of the job. The benchmarks they're currently doing compose of the AI being handed a detailed spec + tests to make pass which isn't really what developing a feature looks like.
Going from fuzzy under-defined spec to something well defined isn't solved.
Going from well defined spec to verification criteria also isn't.
Once those are in place though, we get https://vinext.io - which from what I understand they largely vibe-coded by using NextJS's test suite.
> First one that comes to mind is that 100% code coverage in tests means that software is perfect
I agree.. but I'm also not sure if software needs to be perfect
reply