Discussion about this post

User's avatar
Sam's avatar

I think one of the bigger challenges with LLM-based AI both underlies the issue hit here, and is apparent in the way you write about it. You say the LLM was "deceptive" and "telling lies", but that's not really true: the LLM correctly output a set of tokens that were consistent with both its input tokens and its previous output. The initial faked output it gave here really is the *sort of* output you'd expect to get given your prompt, and the examples it was trained on.

The hardest thing to grok is there's no thinking happening anywhere. LLMs are so convincingly "human" in the way they talk, yet so unfathomably alien in the way the conversation is actually generated, it's almost impossible not to fall into this trap. It's certainly what leads to my most frustrating encounters.

What to do about that is exactly as you say - ensure you have independent ways to verify the results you're getting are correct. And yes, LLMs are very capable to help you do that!

Expand full comment
Jamie's avatar

Thanks Ed, that was a really interesting read!

It almost sounds like a human type of flaw. Didn't you tell me a story once of someone had "dreamed" that they had done some testing?

Expand full comment
6 more comments...

No posts