Posting some great resources to understand the Transformer architecture for NLP presented in the paper “Attention is All You Need” (Vaswani et al. 2017). This website by J Al-Ammar is excellent Second, read this article called “Attention! Attention!” by Lilian Weng For further background on Word Embeddings, look into this post by Jason Brownlee.

Convolution layer for CNN is explained in simple words

To differentiate the loss function in a Neural Network, there are four options Manual differentiation: It is labor intensive and often it is hard to calculate the closed form solution especially for complex function Symbolic differentiation: Like manual, it is also hard for complex function Numerical differentiation: Can handle complex function but may cause numerical […]

There are many flavors of games in Game Theory which are interesting from Machine Learning perspectives, especially from multi-agent Reinforcement Learning applications. Here is the summary of multiple game types are if MinMax algorithm works and what type of strategy one needs to employ.

Entropy is the fundamental unit of information in Information Theory and is extensively useful in Machine Learning. Let us introduce the concepts: Entropy, Joint Entropy, Mutual Information.

This is the recommended template for a Project Proposal for a freelance Machine Learning project. This is my personal choice, and definitely depending on project requirements, you might need to explicitly modify change/modify/add to this list. One size does not always fit all for Machine Learning projects. But anyway this could be a good starting point.

Random Projection is an interesting Dimensionality Reduction technique. You may choose to create a random projection for 1,2,3,..,n dimensional projections. Now how to tell which one is best? So you would need to calculate loss of data due to this reduction in data size. # data has this shape: row, col = 4898, 11 random_projection […]

## System Design (Draft Post)

Classical Design Patterns Consistency Basics of Distributed Systems How databases work Message queueing Performance Applicability Scalability Reliability How do you design a messaging service? How do you design a database system? How do you design a scalable hashtagging system? Spend 30-40 minutes on each problem. Spend a certain time on each aspect for example to […]

There could be two types of training algorithms for the weights for a neuron. First is to minimize the error between predicted y_hat and y. Here y_hat = boolean(activation >= threshold). This type of perceptron-based learning works best for linearly separable data and guarantees finite iterations. Second type is Gradient Descent algorithm which minimizes the […]

An important Machine Learning concept is Reinforcement Learning which is different from the more common Supervised or Unsupervised Learning models. In Supervised learning, you have the labels for training, in Unsupervised learning, there is no labeled data. Reinforcement Learning falls in between the two because it does not have a label but it learns from […]