DLINNLP Telegram 1781
Soumith Chintala (создатель pytorch) выдаёт базу о том как тренироваться на 10К GPU
x.com/soumithchintala/status/1841498799652708712

Оч короткий TL;DR (всем рекомендую прочитать оригинал, он не длинный)

1. Maximize batch size and GPU utilization: 3D parallelism + gradient checkpointing
1. Overlap communication, e.g. while N-1th layer is computing backward, all GPUs with an Nth layer can all-reduce
1. Optimize for your GPU cluster network topology

1. Failure recovery, at 10k GPU scale, things fail all the time -- GPUs, NICs, cables, etc
1. At 10K scale bit flips actually become a problem and can cause loss explosions. Save your model state as frequently and as quickly as you can. To speed it up save it in shards and to CPU memory first and then in a seaprate thread write to disk



tgoop.com/dlinnlp/1781
Create:
Last Update:

Soumith Chintala (создатель pytorch) выдаёт базу о том как тренироваться на 10К GPU
x.com/soumithchintala/status/1841498799652708712

Оч короткий TL;DR (всем рекомендую прочитать оригинал, он не длинный)

1. Maximize batch size and GPU utilization: 3D parallelism + gradient checkpointing
1. Overlap communication, e.g. while N-1th layer is computing backward, all GPUs with an Nth layer can all-reduce
1. Optimize for your GPU cluster network topology

1. Failure recovery, at 10k GPU scale, things fail all the time -- GPUs, NICs, cables, etc
1. At 10K scale bit flips actually become a problem and can cause loss explosions. Save your model state as frequently and as quickly as you can. To speed it up save it in shards and to CPU memory first and then in a seaprate thread write to disk

BY DL in NLP


Share with your friend now:
tgoop.com/dlinnlp/1781

View MORE
Open in Telegram


Telegram News

Date: |

To delete a channel with over 1,000 subscribers, you need to contact user support Invite up to 200 users from your contacts to join your channel On June 7, Perekopsky met with Brazilian President Jair Bolsonaro, an avid user of the platform. According to the firm's VP, the main subject of the meeting was "freedom of expression." Image: Telegram. Select “New Channel”
from us


Telegram DL in NLP
FROM American