tgoop.com/awesomedeeplearning/236
Last Update:
Microsoft just casually shared their new Phi-3 LLMs less than a week after Llama 3 release. Based on the benchmarks in technical report (https://arxiv.org/abs/2404.14219), even the smallest Phi-3 model beats Llama 3 8B despite being less than half the size.
Phi-3 has "only" been trained on 5x fewer tokens than Llama 3 (3.3 trillion instead of 15 trillion)
Phi-3-mini less has "only" 3.8 billion parameters, less than half the size of Llama 3 8B.
Despite being small enough to be deployed on a phone (according to report), it matches the performance of the much larger Mixtral 8x7B and GPT-3.5. (Phi-3 mini can be quantized to 4-bits, so it only requires ≈ 1.8GB of memory.)
What is the secret sauce? According to the technical report, it's dataset quality over quantity: "heavily filtered web data and synthetic data".
Next to the 4k context-window version, there's also a phi-3-mini-128K model that supports up to 128k tokens.
Fun fact: Phi-3 uses the same tokenizer with a vocabulary size of 32,064 as Llama 2.
BY GenAi, Deep Learning and Computer Vision

Share with your friend now:
tgoop.com/awesomedeeplearning/236