Twitter PyTorch PyTorch | PyTorch (@PyTorch) のツイート

The @PyTorch-based pipelines in 🤗 Transformers now support native torch datasets.

GPUs were often underutilized: 3… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

Announcing new PyTorch library releases. Highlights:

- TorchX: a new SDK for ML applications
- TorchAudio: text-t… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

PyTorch 1.10 is here!

Highlights include updates for:
- CUDA Graphs APIs updates
- Several frontend APIs moved to… twitter.com/i/web/status/1…

WeightWatcher: Tool for predicting the accuracy of Deep Neural Networks

Is your model over-fitted/parameterized? D… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

Participating in #PyTorchHackathon? Watch @Jyothi_Nookula and Narine Kokhlikyan discuss why Responsible AI matters… twitter.com/i/web/status/1…

First journal paper of the PhD is out! Hooray! 🎉 We introduce THINGSvision, a Python toolbox for streamlining the e… twitter.com/i/web/status/1…

Flyingsquid: Python framework for more interactive weak supervision

by @HazyResearch

Faster weak supervision dev… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

Learn more about half precision on the PyTorch Developer Podcast episode: pytorch-dev-podcast.simplecast.com/episodes/half-…

torch. autocast… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

For non-BF16 and ARM CPUs, lower precision is currently enabled via quantization.

Quantization converts FP32 to IN… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

Low-precision gradients save network bandwidth in distributed training too.

You can enable gradient compression t… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

Don’t wrap your backward pass in `autocast()`!

Ensure you’re only wrapping your forward pass and the loss computa… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

Running Resnet101 on a Tesla T4 GPU shows AMP to be faster than explicit half-casting:

7/11 pic.twitter.com/XsUIAhy6qU

Jyq2lgia reasonably small

For torch <= 1.9.1, AMP was limited to CUDA tensors using
`torch.cuda.amp. autocast()`

v1.10 onwards, PyTorch has… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

A better solution is to use Automatic Mixed Precision to let PyTorch choose the right op-specific precision (FP32 v… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

FP16 is only supported in CUDA, BF16 has support on newer CPUs and TPUs

Calling .half() on your network and tensor… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

3 lower precision datatypes are typically used in PyTorch:

* FP16 or half-precision (`torch. float16`)

* BF16 (`t… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

Lower precision speeds up :

* compute-bound operations, by reducing load on the hardware

* memory bandwidth-bound… twitter.com/i/web/status/1…

Jyq2lgia reasonably small

✨ Low Numerical Precision in PyTorch ✨
Most DL models are single-precision floats by default.
Lower numerical preci… twitter.com/i/web/status/1…