Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

High level, the approach is:

- I'm pain point driven:

  - I can't compare what I can't measure. 

  - I can't trust to run this "AI" tool to run on its own
- That's automation, which is about intentionality (can I describe what I want?) and risk profile understanding (What's the blast radius/worst that could happen)

Then I treat it as if it was an Integration Test/Test Driven Development exercise of sorts.

- I don't start designing an entire cloud infrastructure.

- I make sure the "agent" is living in the location where the users actually live so that it can be the equivalent of an extra paid set of hands.

- I ask questions or replicate user stories and use deterministic tests wherever I can. Don't just go for LLMaaJ. What's the simplest thing you can think of?

- The important thing is rapid iteration and control. Just like in a unit testing scenario it's not about just writing a 100 tests but the ones that qualitatively allow you to move as fast as possible.

- At this stage where the space is moving so fast and we're learning so much, don't assume or try to over-optimize places that don't hurt and instead think about minimalism, ease of change, parameterization and ease of comparison with other components that form "the black box" and with itself.

- Once you have the benchmarks that you want, you can decide things like pick the cheapest model/agent configuration that does the job within the acceptable timeframe.

Happy to go deeper on these. I have some practical/runnable samples/text I can share on the topic after the weekend. I'll drop a link here when it's ready

 help



This is really insightful. Thank you.

Your first two points jive with my intuition that an agents primaries should be a code execution sandbox, mounted files and git.

If you have any practical examples to share I’m sure a ton of people would appreciate it.


I just shared this in HN https://news.ycombinator.com/item?id=47026263 to see if it's possible to scale the knowledge sharing and simple and good practices which keep people in control.

It may or may not address the practical examples you need but I'd been to hear your thoughts and maybe it's possible to come up with a more illustrative one.

I didn't go for bubblewrap or similar containers yet because I didn't want to lose a specific type of baseline newcomer yet (Economists who do some coding) but I will be adding to it with whatever most elegant approaches I can find that don't leak too much complexity for things like sandboxing, system testing, integration mocking (reverse proxying), Observing with Openteleletry or otherwise, presenting benchmarks, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: