Categories
Algorithms Data Science OMSCS

Game Theory Concepts for Reinforcement Learning

There are many flavors of games in Game Theory which are interesting from Machine Learning perspectives, especially from multi-agent Reinforcement Learning applications. Here is the summary of multiple game types are if MinMax algorithm works and what type of strategy one needs to employ.

Categories
Data Science

Information Theory Concepts for Machine Learning

Entropy is the fundamental unit of information in Information Theory and is extensively useful in Machine Learning. Let us introduce the concepts: Entropy, Joint Entropy, Mutual Information.

Categories
Algorithms Data Science

Freelance Machine Learning Project Proposal Template

This is the recommended template for a Project Proposal for a freelance Machine Learning project. This is my personal choice, and definitely depending on project requirements, you might need to explicitly modify change/modify/add to this list. One size does not always fit all for Machine Learning projects. But anyway this could be a good starting point.

Categories
Algorithms Data Science

Policy Iteration, Value Iteration, and Q-Learning

Quick introduction to basic Reinforcement Learning algorithms including Bellman Equation, Policy Iteration, Value Iteration, and Q-Learning

Categories
Data Science System

Tips: Reading Hive Tables from Spark

Collection of useful tips when working with Big Data tools including Hadoop, Hive, Spark

Categories
Data Science System

Big Data Handy References

Integrating Apache Hive with Kafka, Spark, and BI: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/integrating-hive/content/hive_hivewarehousesession_api_operations.html

Categories
Algorithms Data Science

Introduction to Reinforcement Learning: Key Terminologies

An important Machine Learning concept is Reinforcement Learning which is different from the more common Supervised or Unsupervised Learning models. In Supervised learning, you have the labels for training, in Unsupervised learning, there is no labeled data. Reinforcement Learning falls in between the two because it does not have a label but it learns from […]

Categories
Data Science System

Distributed/Big Data Geospatial Processing Tools

Work-in-progress. I will write more about each approach later in details. Just summarizing the tools for connecting to Hadoop and running geospatial processing on a large dataset. I am working on a ~100 GB Hive Table which is just a small subset of the original dataset http://geospark.datasyslab.org/ https://pypi.org/project/geopyspark/ https://github.com/Esri/gis-tools-for-hadoop/wiki Kinetica GPU Database – Graph solver […]

Categories
Data Science

How much memory does importing PANDAS library take?

Objective Let us compare the memory consumption of Python PANDAS library. Methodology Small script called memtest.py: @profile def a(): import pandas if __name__==”__main__”: a() Test it with $python -m memory_profiler memtest.py Results: Output: The increment of memory usage in line#4 shows, it took 36.527MB to import pandas on my machine. What does your benchmarking result […]