That's what some of my dataset do, but then you're still stuck with one reply trained, not an entire conversation.
I break my head around that haha
Edit: I missread,if you add multiple in the context, the model is confused because they are trimmed out of the context by the chat template to not waste token we don't need anymore.
So we can't train it like this either, because the bot will have multiple thinking process in the conversation.