Member-only story
Top 5 Must-Reads on AI This Week
Is GPT-4 getting worse over time? / Llama 2 / Why computer-made data is being used to train AI models / Apple GPT / Google Tests A.I. Tool That Is Able to Write News Articles
Is GPT-4 getting worse over time?
(Arvind Narayanan and Sayash Kapoor on AI Snake Oil): “In short, the new paper [Chen, Zaharia, and Zou] doesn’t show that GPT-4 capabilities have degraded. But it is a valuable reminder that the kind of fine tuning that LLMs regularly undergo can have unintended effects, including drastic behavior changes on some tasks. Finally, the pitfalls we uncovered are a reminder of how hard it is to quantitatively evaluate language models.”
Meta’s Llama 2
“Llama 2 [is] a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models.” (Paper). “Llama 2 models are trained on 2 trillion tokens and have double the context length of Llama 1. Llama-2-chat models have additionally been trained on over 1 million new human annotations.” (Blog post).