AWESOMEDEEPLEARNING Telegram 236
Microsoft just casually shared their new Phi-3 LLMs less than a week after Llama 3 release. Based on the benchmarks in technical report (https://arxiv.org/abs/2404.14219), even the smallest Phi-3 model beats Llama 3 8B despite being less than half the size.

Phi-3 has "only" been trained on 5x fewer tokens than Llama 3 (3.3 trillion instead of 15 trillion)

Phi-3-mini less has "only" 3.8 billion parameters, less than half the size of Llama 3 8B.

Despite being small enough to be deployed on a phone (according to report), it matches the performance of the much larger Mixtral 8x7B and GPT-3.5. (Phi-3 mini can be quantized to 4-bits, so it only requires ≈ 1.8GB of memory.)

What is the secret sauce? According to the technical report, it's dataset quality over quantity: "heavily filtered web data and synthetic data".

Next to the 4k context-window version, there's also a phi-3-mini-128K model that supports up to 128k tokens.

Fun fact: Phi-3 uses the same tokenizer with a vocabulary size of 32,064 as Llama 2.
4👍2



tgoop.com/awesomedeeplearning/236
Create:
Last Update:

Microsoft just casually shared their new Phi-3 LLMs less than a week after Llama 3 release. Based on the benchmarks in technical report (https://arxiv.org/abs/2404.14219), even the smallest Phi-3 model beats Llama 3 8B despite being less than half the size.

Phi-3 has "only" been trained on 5x fewer tokens than Llama 3 (3.3 trillion instead of 15 trillion)

Phi-3-mini less has "only" 3.8 billion parameters, less than half the size of Llama 3 8B.

Despite being small enough to be deployed on a phone (according to report), it matches the performance of the much larger Mixtral 8x7B and GPT-3.5. (Phi-3 mini can be quantized to 4-bits, so it only requires ≈ 1.8GB of memory.)

What is the secret sauce? According to the technical report, it's dataset quality over quantity: "heavily filtered web data and synthetic data".

Next to the 4k context-window version, there's also a phi-3-mini-128K model that supports up to 128k tokens.

Fun fact: Phi-3 uses the same tokenizer with a vocabulary size of 32,064 as Llama 2.

BY GenAi, Deep Learning and Computer Vision




Share with your friend now:
tgoop.com/awesomedeeplearning/236

View MORE
Open in Telegram


Telegram News

Date: |

1What is Telegram Channels? Avoid compound hashtags that consist of several words. If you have a hashtag like #marketingnewsinusa, split it into smaller hashtags: “#marketing, #news, #usa. Telegram channels enable users to broadcast messages to multiple users simultaneously. Like on social media, users need to subscribe to your channel to get access to your content published by one or more administrators. Telegram iOS app: In the “Chats” tab, click the new message icon in the right upper corner. Select “New Channel.” With the sharp downturn in the crypto market, yelling has become a coping mechanism for many crypto traders. This screaming therapy became popular after the surge of Goblintown Ethereum NFTs at the end of May or early June. Here, holders made incoherent groaning sounds in late-night Twitter spaces. They also role-played as urine-loving Goblin creatures.
from us


Telegram GenAi, Deep Learning and Computer Vision
FROM American