In model terms, it could be a sequence of tokens that, when used as a chain-of-thought prompt, yields samples that are intrinsically rewarding to attend to via some yet-undiscovered internal reward function.
You are viewing a single comment's thread from: