Hi Ron, thank you for asking this question!
Those 14KB aren't the memory of the model in the sense of the amount of "knowledge" about coding or natural language it has. That information is stored in the parameters of the model. 14KB is the memory it has to handle a single input sentence.
Let's see an example. Imagine you're writing a relatively complex program. You've written 1000 lines of code already and you want Codex to help you. For it to better grasp what you want, it needs to look above and see the code you've already written. It will see the variables, the methods, and so on.
Those 14KB serve this purpose. The larger this memory store, the higher Codex can look up the code you've written (although I think for now it's limited to a single document).
Is it more clear now? :)