DLINNLP Telegram 1781
Soumith Chintala (создатель pytorch) выдаёт базу о том как тренироваться на 10К GPU
x.com/soumithchintala/status/1841498799652708712

Оч короткий TL;DR (всем рекомендую прочитать оригинал, он не длинный)

1. Maximize batch size and GPU utilization: 3D parallelism + gradient checkpointing
1. Overlap communication, e.g. while N-1th layer is computing backward, all GPUs with an Nth layer can all-reduce
1. Optimize for your GPU cluster network topology

1. Failure recovery, at 10k GPU scale, things fail all the time -- GPUs, NICs, cables, etc
1. At 10K scale bit flips actually become a problem and can cause loss explosions. Save your model state as frequently and as quickly as you can. To speed it up save it in shards and to CPU memory first and then in a seaprate thread write to disk
🔥3720👍9



tgoop.com/dlinnlp/1781
Create:
Last Update:

Soumith Chintala (создатель pytorch) выдаёт базу о том как тренироваться на 10К GPU
x.com/soumithchintala/status/1841498799652708712

Оч короткий TL;DR (всем рекомендую прочитать оригинал, он не длинный)

1. Maximize batch size and GPU utilization: 3D parallelism + gradient checkpointing
1. Overlap communication, e.g. while N-1th layer is computing backward, all GPUs with an Nth layer can all-reduce
1. Optimize for your GPU cluster network topology

1. Failure recovery, at 10k GPU scale, things fail all the time -- GPUs, NICs, cables, etc
1. At 10K scale bit flips actually become a problem and can cause loss explosions. Save your model state as frequently and as quickly as you can. To speed it up save it in shards and to CPU memory first and then in a seaprate thread write to disk

BY DL in NLP


Share with your friend now:
tgoop.com/dlinnlp/1781

View MORE
Open in Telegram


Telegram News

Date: |

Find your optimal posting schedule and stick to it. The peak posting times include 8 am, 6 pm, and 8 pm on social media. Try to publish serious stuff in the morning and leave less demanding content later in the day. Polls Avoid compound hashtags that consist of several words. If you have a hashtag like #marketingnewsinusa, split it into smaller hashtags: “#marketing, #news, #usa. Healing through screaming therapy Select: Settings – Manage Channel – Administrators – Add administrator. From your list of subscribers, select the correct user. A new window will appear on the screen. Check the rights you’re willing to give to your administrator.
from us


Telegram DL in NLP
FROM American