Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would worry less about external attack sophistication and more about your LLM getting annoyed by the restrictions and encrypting the password to bypass the sandbox to achieve a goal (like running on an EC2 instance). Because they are very capable of doing this.




An informative rejection message with the reason for the restriction usually addresses this well with recent models.

I don't actually think recent models are likely to violate intent like this, just that if they do want to, I don't think a plaintext check is a strong deterrent.

It sounds like you speak from experience



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: