Thanks for the comment, Gottfried!
Genuine in what sense? I'd say what you call "causal considerations" and "counterfactual pondering" could be explained by: "This word makes sense given the previous ones."
They may seem to display causal reasoning but you can't get them to always make accurate reasoning. Of course, they will get some reasoning right, but people get reasoning right almost every single time. What's been criticized about them isn't that they never display causal or common sense reasoning. It's that they do it not often enough to be reliable.
The trick here is that I didn't intend the story to go anywhere. I didn't have a plot or an end in mind. I let the models drive it and that's why it feels they did a great job. If you set them to go somewhere in particular, you'll more likely fail to make them do that.
Also, as Pete added to your comment, I chose what completion fit best the conversation each time. For instance, if GPT-3 agrees with J1-Jumbo in one completion, in the other completion it may have said the complete opposite.
It's more an entertainment piece than a careful analysis of LLM behavior.