So essentially they are almost the same thing except the gradients of the kernels will still remain flipped!

But that little difference does not matter! Because no matter which way the kernel is flipped, it can still learn the best kernel values for the operation. So turns out we can enjoy the little mathematical convenience without any issue!

But there is more to that! It turns out, the derivative of the cross-correlation is the convolution!!! So it is going to be very useful in the backward pass of the CNN because we know how to convolve!! If we instead used convolution in the forward pass, the backward will be cross-correlation!

Math is amazing! ]]>

**Manual differentiation**: It is labor intensive and often it is hard to calculate the closed form solution especially for complex function**Symbolic differentiation:**Like manual, it is also hard for complex function**Numerical differentiation**: Can handle complex function but may cause numerical issues-
**Automatic differentiation**: In Deep Learning, most libraries use automatic differentiation

- Database System Design
- Reinforcement Learning and Decision Making (Eventually dropped)
- Software Architecture and Design
- Graduate Introduction to Operating System
- Information Security
- Machine Learning for Trading
- Graduate Algorithm
- Machine Learning
- Deep Learning
- Computer Networks
- Distributed Systems

A question helps you land in a couple of possible answers or it might be an open ended question which is basically an infinite set of possible answers.

The set of possible answers to the question lead to hypotheses. Hypotheses help validate or reinforcement our knowledge. This kind of hypotheses testing is crucial for the modern entrepreneur.

Asking question is also the optimal approach to your life. Bellman equation actually models this in a Reinforcement Learning framework where you keep testing out a set of possible answers and pick the one that looks best. You may fall a few times, but statistically speaking, your chance of success is the most that way.

Asking question is also a very powerful technique to solve a problem, especially problems in your own life. Given all the information in the world on the internet, one needs to come up with the right questions to reach an answer. This is the scientific method. With more and more information available at our fingertips, it would be the ones with the learned ability to form the logical sequence of questions, who will be able to find answers to the most important problems, and become successful entrepreneurs and thought-leaders. With everyone having somewhat equal access to information, there will be a level playing field for all. The ones who find problems under the guise of a question and willing to jump off the cliff would be the ones bringing the most positive change. Asking the best question will be the survival skill of the Homo sapiens in the modern 4th industrial revolution age.

Question everything, especially the status quo. Could you live a better life if something were not as it is? Be uncomfortable with problems you see everyday, the ones your mind decides to ignore because you are too familiar with the problems and you know there is a solution that works however tiny bit of pain it causes. If you can make it 1% better and help 1 million people get its benefits, that is a real tangible Total Addressable Market. Paul Graham would like you to snatch these opportunities to become an entrepreneur. Because all these micro problems will help you find a solution that could become a viable business had you asked questions to yourself, could that thing be any better?

That tiny bit of improvement can also lead to a big innovation down the road. Thomas Alva Edison, who had an astounding number of patents, famously said that he had to genuine ideas, he only improved existing solutions to existing problems. If you noticed, problems often disguise as questions.

There are simple frameworks for coming up with the right question. Let me present a few I personally use a lot

- Use the basic maths concepts, could you divide the problem? Do multiple solutions add up to a new solution? This way of asking questions on how to add, divide, subtract, and multiply a concept is useful as a Critical Thinking technique.
- Find patterns, find similarities or dissimilarities
- Find the Wh questions. Who, Where, What, When, Why, and How. When you are making a decision, ask yourself why you are doing it, then what are you going to do, then how you are going to do it. This is the famous Start with Why TED talk by Simon Sinek. You can also ask yourself when are you going to do it. Remember, the difference between a dream and a plan of action is this when question. Until the moment you realize your life is short and you better set a time for your dreams to come true, it barely lead up to an action towards the fulfillment of your dream in the real world.

Game ID | #Players | Outcome | Deterministic? | Information | #Round | Strategy |

1 | 2 | Zero-sum | Deterministic | Perfect Information | Single | MinMax works, Pure strategy |

2 | 2 | Zero-sum | Stochastic | Perfect | Single | MinMax works, Pure strategy |

3 | 2 | Zero-sum | Stochastic | Hidden | Single | MinMax does NOT work, Mixed Strategy |

4 | 2 | Non-zero sum | Stochastic | Hidden | outcome is same for every round | Solve for Nash Equilibrium, Pure or mixed |

5 | 2 | zero-sum | Stochastic | Hidden | defined by gamma | With finite states, it is a Markov Decision Process. Solve Minimax-Q algorithm |

6 | 2 or more | Non-zero sum | Stochastic | Not-hidden | Defined by gamma | Active Research! |

**Game ID#4: **If strongly dominant strategy is present, then pure strategy might work or else might need a mixed strategy

**Game ID#5: ** For small number of rounds (gamma~=0), betrayal might provide more reward, but for infinite number of rounds (gamm~=1), cooperation yields more reward

- Tit-for-Tat is not subgame perfect when considering a longer future time horizon! It means it can give itself more rewards overall if it does not choose to retribute against the other player! So forgiveness is a better virtue! This forgiving strategy is called Pavlov state machine!

Entropy is the fundamental unit of information in Information Theory and is extensively useful in Machine Learning. Let us introduce the concepts: Entropy, Joint Entropy, Mutual Information.

This is defined as below

H(x) = P(x1)*#bits(x1) + P(x2) * #bits(x2) + ... + P(xn)*#bits(xn)

Where the variable X is a discrete variable such that it can take these values, {x1, x2,…,xn}. Now there is a nice video by Puskar Kolhe explaining to help us understand the concept

Now again, the ones with higher probability may get a smaller bits assigned because we can reduce the data size by reducing the overall size, so the

P(xi) = 1 / 2^(#bits(xi))

Which leads to,

#bits(xi) = logBase2(1/P(xi)) => #bits(xi) = -logBase2(P(xi))

This is how, the entropy definition becomes

H(x) = SumOver_i[ -P(xi) * log(P(xi)) ]

This measures the information contained in two discrete variables X, and Y

H(X,Y) = SumOver_ij[ -P(xi,yj) * log(P(xi,yj)) ]

Here, `P(xi,yj)`

is the joint probability of the variables X, Y. If the X,Y variables are independent, then the joint entropy becomes just the sum of the individual entropies as below since they do not tell anything when combined together!

H(X,Y) = H(X) + H(Y)

This measures the additional information one gets when one of the variables, X is already known. Just swap the last portion with `P(yj|xi)`

H(Y|X) = SumOver_ij[ -P(xi,yj) * log(P(xi|yj)) ]

If the X,Y variables are independent, then `H(Y|X)=H(Y)`

because knowing X does not help figure out the distribution of Y.

This is probably the most useful metric to measure information between variables. The formula is below

I(X,Y) = H(Y) - H(Y|X)

It tells that, what extra information is there from Y if we already know the conditional entropy of Y given X.

It is important to remember, for two independent variables X,Y, the mutual information `I(X,Y)=0`

. If there is any dependency, then `I(X,Y)>0`

- Avg Time-Complexity: O(nlogn)
- Worst Time-Complexity: O(n^2)
- Space complexity: O(1)

```
import java.util.ArrayList;
public class QuickSort {
public void sort(ArrayList<Integer> A){
sort(A, 0, A.size()-1);
}
private void sort(ArrayList<Integer> A, int i, int j) {
if (i < j) {
int p = partition(A, i, j);
sort(A, i, p-1);
sort(A, p+1, j);
}
}
private int partition(ArrayList<Integer> A, int i, int j) {
int pivot_index = i + (j-i)/2;
Integer pivot_value = A.get(pivot_index);
while (i < j) {
while (A.get(i) < pivot_value) i++;
while (A.get(j) > pivot_value ) j--;
Integer temp = A.get(i);
A.set(i, A.get(j));
A.set(j, temp);
}
return i;
}
}
```

Now test a few cases and see if the implementation passes all of them

```
import static org.junit.jupiter.api.Assertions.*;
import java.util.ArrayList;
import java.util.Arrays;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
class QuickSortTest {
@BeforeEach
void setUp() throws Exception {
}
@Test
void testSort() {
ArrayList<Integer> testArray = new ArrayList<>(Arrays.asList(100,10,1,5,50));
QuickSort qs = new QuickSort();
qs.sort(testArray);
assertTrue(testArray.equals(new ArrayList<>(Arrays.asList(1,5,10,50,100) )));
}
@Test
void testSortNoArray() {
ArrayList<Integer> testArray = new ArrayList<>();
QuickSort qs = new QuickSort();
qs.sort(testArray);
assertTrue(testArray.equals(new ArrayList<>()));
}
@Test
void testSortBigArray() {
ArrayList<Integer> testArray = new ArrayList<>(Arrays.asList(0, 50, Integer.MIN_VALUE, Integer.MAX_VALUE));
QuickSort qs = new QuickSort();
qs.sort(testArray);
assertTrue(testArray.equals(new ArrayList<>(Arrays.asList(Integer.MIN_VALUE, 0, 50, Integer.MAX_VALUE))));
}
@Test
void testSortSameArray() {
int[] arr = new int[] {0, 50, 50,50,Integer.MIN_VALUE, Integer.MAX_VALUE};
MergeSort ms = new MergeSort();
ms.sort(arr);
assertTrue(Arrays.equals(arr, new int[] {Integer.MIN_VALUE, 0, 50, 50, 50, Integer.MAX_VALUE}));
}
}
```

A good reference is here by Michael Cotterell

]]>```
public class MergeSort{
public void sort(int[] a){
sort(a, 0, a.length -1);
}
private void sort(int[] a, int i, int j) {
if (i < j) {
int pivot_point = i + (j-i)/2;
sort(a, i, pivot_point);
sort(a, pivot_point+1, j);
merge(a, i, pivot_point, j);
}
}
private void merge(int[] a, int leftStart, int leftEnd, int rightEnd) {
//Arrays merged =new ArrayList<Integer>();
int leftIndex = leftStart;
int rightIndex = leftEnd+1;
int tempIndex = 0;
//create a new storage location for the values to copy
int[] temp = new int[rightEnd-leftStart+1];
// Compare from the beginning of the two parts, pick the smaller between
// left and right components. Stop when either the end of left or right is reached
while (leftIndex <= leftEnd && rightIndex <= rightEnd) {
if (a[leftIndex] <= a[rightIndex]) {
temp[tempIndex++] = a[leftIndex++];
} else {
temp[tempIndex++] = a[rightIndex++];
}
}
// if more items left in left part, right was smaller, then copy them over
while (leftIndex <= leftEnd) {
temp[tempIndex++] = a[leftIndex++];
}
//do the same but for the right part
while (rightIndex <= rightEnd) {
temp[tempIndex++] = a[rightIndex++];
}
//now copy the temp storage into the original array
for (int i = 0; i < temp.length; i++) {
a[leftStart++] = temp[i];
}
}
}
```

]]>It goes from left child then current node then right child

So for the given tree above it will be: `[2,17,7,19,3,100,25,36,1]`

public void inOrder(Node node){ if (node==null) return; inOrder(node.left); # repeat this on the left node visit(node); # do something at current node inOrder(node.right); # now work on right node }

It first deals with current node and goes from left child then right child. So the pre-order traversal is: `[100,19,17,2,7,3,36,25,1] `

public void preOrder(Node node){ if (node==null) return; visit(node); # do something at current node preOrder(node.left); # repeat this on the left node preOrder(node.right); # now work on right node }

It recursively visit left child then right child, and only then the current node in the tree. So for this: `[2,7,17,3,19,25,1,36,100] `

public void postOrder(Node node){ if (node==null) return; postOrder(node.left); # repeat this on the left node postOrder(node.right); # now work on right node visit(node); # do something at current node }

Level-order traversal basically is Breadth-First Search (BFS) traversal of a tree. For a BFS, a queue data structure is useful. See the queue implementation here

public ArrayList<Node> traverse(){ ArrayList<Node> closed = new ArrayList<>(); Queue<Node> open = new LinkedList<Node>(); open.add(this.startNode); while (open.isEmpty() == false) { Node currentNode = open.poll(); if (currentNode != null) { for (Node child : currentNode.getChildren()) { if (child != null) open.add(child); } closed.add(currentNode); } } return closed; }]]>

This is marked as a medium difficult problem. However if you know what in-order traversal does, it is a very simple problem. Here I just added a helper `traverse()`

method.

All the trick is done in the few lines inside the function. It checks if the current node is null, then it skips. Else it runs the same method recursively on the left child first, then it actually adds the current node’s value to the list called `result`

. Then it proceeds to the right child just like left one.

class Solution { public List<Integer> inorderTraversal(TreeNode root) { return traverse(root, new ArrayList<Integer>()); } private List<Integer> traverse(TreeNode node, List<Integer> result){ if (node==null) return result; traverse(node.left, result); result.add(node.val); traverse(node.right, result); return result; } }

This solution beats 100% of the previous solutions by runtime and 99.62% by memory! Wooho!

]]>