Posting some great resources to understand the Transformer architecture for NLP presented in the paper “Attention is All You Need” (Vaswani et al. 2017).
- This website by J Al-Ammar is excellent
- The next best resource is this annotated implementation of Transformer in PyTorch from Harvard University
- Second, read this article called “Attention! Attention!” by Lilian Weng
- For further background on Word Embeddings, look into this post by Jason Brownlee.