Thank you Pete for the clarification. Yes, I did that for each completion, although in a few cases I liked the first generation and didn't repeat it.
The reason why I got such a nice story is that I didn't burden any of them with a lot of text to generate. Every time one of them made a completion, the whole previous text history was fed to the other as the prompt. The ability of these models to generate a short meaningful completion from a long prompt is remarkable.
On the contrary, having them generate a long paragraph is way more difficult. Also, prompt engineering is crucial, that may have something to do with your results?