I would worry less about external attack sophistication and more about your LLM ...

ErikBjare · 2026-01-22T12:31:27 1769085087

An informative rejection message with the reason for the restriction usually addresses this well with recent models.

bpodgursky · 2026-01-22T15:28:56 1769095736

I don't actually think recent models are likely to violate intent like this, just that if they do want to, I don't think a plaintext check is a strong deterrent.

catlifeonmars · 2026-01-22T05:52:36 1769061156

It sounds like you speak from experience